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(57) Abstract 

The invention pertains to a process for producing a determined polypeptide of interest or repeats thereof in a seed 
forming plant. It comprises: cultivating plants obtained from regenerated plant cells or from seeds of plants obtained from 
said regenerated plant cells over one or several generations, whose genetic patrimony, replicable with said plants, com- 
prises a precursor-coding nucleic acid sequence encoding the precursor of a plant storage protein and placed under the 
control of a seed promoter, said precursor-coding nucleic acid being modified in a non-essential region of its relevant se- 
quence which encodes the mature storage protein or a sub-unit thereof with a nucleic acid insert in appropriate reading 
phase relationship with the surrounding part of said relevant sequence, said insert including a determined segment encod- 
ing a heterologous determined polypeptide of interest or repeats thereof linked to each other and downstream and up- 
stream thereof to the remainder parts of said relevant sequence through codons encoding aminoacid(s) which define selec- 
tively cleavable border sites surrounding the peptide of interest in the hybrid storage protein encoded by the so-modified 
relevant sequence of said precursor-coding nucleic acid; recovering the seeds of the cultivated" plants and extracting the 
hybrid storage proteins contained therein; cleaving out the peptide of interest from said hybrid storage protein, purifying 
and recovering the peptide of interest. The polypeptide of interest may be a biologically active protein and/or a labeled 
protein. 
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A process for the production of biol ogically active 

peptide — via fch£ ^pyession of modified storage seed 

protein genes in transgeni c plants 
5 The invention relates to a process for the 

production of useful biologically active polypeptides 
through the modification of appropriate plant genes. 

The production of determined biologically active 
polypeptides in easily purifiable form and useful 

10 quantities is still fraught, in most instances, with 
considerable difficulties . 

Alternative procedures are chemical synthesis or 
production by genetically engineered microorganisms. The 
first is very expensive and often does not result in 

15 polypeptides with the correct conformation. The latter 
alternative is difficult due to problems of instability of 
the polypeptide, intracellular precipitation, and puri- 
fication of the product in a pure form. In addition, some 
classes of peptides, including hormonal peptides, are 
20 fully active only after further processing such as correct 
disulfide bridge formation, acetylation, glycosylation or 
methylation. In nature disulfide bridges are formed with 
high efficiency because they are co-translationally 
catalysed by protein disulfide isomerase during membrane 
25 translocation of the precursors. The active form is then 
derived from the precursor by proteolytic cleavage 
processes . 

Peptides chemically synthesised or overproduced in 
prokaryotic systems are generally obtained in a reduced 

30 form, and the disulfide bridges must then be formed by 
mild oxidation of the cysteine residues. Since one often 
starts from the fully denatured "scrambled" state of the 
peptide, disulfide bridge formation is then a random 
process, during which intermolecular bridges (yielding 

35 higher molecular weight aggregates) and incorrect 
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disulfide bonds (yielding inactive peptides) may be 
generated in addition to the correctly folded peptide. 

Using plant cells as systems for the production of 
determined peptides has also been suggested, e.g. in 
5 PCT/US86/01599 . There is no evidence in that patent that 
the suggested methods, whose principle lies in bringing 
constitutively to expression said peptide according to 
known techniques (EP831 1 2985 . 3) , permit obtaining high 
expression levels without disturbing the plant physiology 
10 and • high yields in recovering said peptides by separating 
them from plant proteins. This will especially be the case 
when the whole plant is used as such and grown in soil. 

An object of the invention is to overcome these 
difficulties, to provide economically valuable processes 
15 and genetically engineered live matter which can be 
produced in large amounts, in which determined poly- 
peptides can both be synthesized in large amounts without 
disturbing the physiology of" said live matter and produced 
in a form providing for a high degree of physiological 
2o activity common to the wild type peptide having the same 
or substantially the same amino acid sequences and can be 
easily recovered from said live matter. 

More particularly the invention aims at providing 
genetically modified plant DNA and plant live material 
25 including said genetically modified DNA replicable with 
the cells. .__of said plant material, which genetically 
modified plant DNA contains sequences encoding for said 
determined polypeptides whose expression is under the 
control of a given plant promotor which conducts said 
30 expression in at least a stage of the development of the 
corresponding plants. This stage of development is chosen 
in a way that the expression occurs in plant organs or 
tissue which are produced in high amounts and easily 
recoverable. 

35 A further object of the invention is to take 

advantage of the capacity of seed storage proteins to be 
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produced in large amounts in plants and to be expressed at 
a determined stage of development of said plants, parti- 
cularly at the seed formation stage. More particularly the 
invention aims at taking advantage of the ease with which 
water soluble storage proteins can be recovered from the 
corresponding plant seeds . 

The expression of foreign genes in plants is well 
established (De Blaere et al . . 1987). In several cases 
seed storage protein genes have been transferred to other 
plants. In several cases it was shown that within its new 
environment the transferred seed storage protein gene is 
expressed in a tissue specific and developmentally 
regulated manner (Beachy et al . , 1985 ; Okamuro et al . , 
1986 ; Sengupta-Gopalan et al., 1985 ; Higgins et al . , 
1986). This means that the transferred gene is expressed 
only in the appropriate parts of the seed, and only at the 
normal time. It has also been shown in at least one case 
that foreign seed storage proteins are located in the 
protein bodies of the host plant (Greenwood and 
Chrispeels, 1985). It has further been shown that stable 
and functional messenger RNAs can be obtained if a cDNA, 
rather than a complete gene including introns, is used as 
the basis for the chimeric gene (Chee et al . / 1986). 

Seed storage proteins represent up to 90 % of 
total seed protein in seeds of many plants. They are used 
as a source of nutrition for young seedlings in the period 
immediately after germination. The genes encoding them are 
strictly regulated and are expressed in a highly tissue 
specific and stage specific fashion ((Walling et al . , 
1986; Higgins, 1984). Thus they are expressed almost 
exclusively in developing seed, and different classes of 
seed storage proteins may be expressed at different stages 
in the development of the seed. They are generally 
restricted in their intracellular location, being stored 
35 in membrane bound organelles called protein bodies or 
protein storage vacuoles. These organelles provide a 
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protease-free environment, and often also contain protease 
inhibitors. These proteins are degraded upon flowering, 
and are thought to serve as a nutritive source for 
developing seeds. Simple purification techniques for 
5 several classes of these proteins have been described. 

Seed storage proteins are generally classified on 
the basis of solubility and size (more specifically 
sedimentation rate, for instance as defined by Svedberg 
(in Stryer, L., Biochemestry , 2nd ed., W.H. Freeman, New 

10 York, page 599). A particular class of seed storage 
proteins has been studied, the 2S seed storage proteins, 
which are water soluble albumins and thus easily separated 
from other proteins. Their small size also simplifies 
their purification. Several 2S storage proteins have been 

15 characterised at either the protein or cDNA levels (Crouch 
et al., 1983 ; Sharief and Li, 1982 ; Ampe et al., 1986 ; 
Altenbach et al . , 1987 ; Ericson et al., 1986 ; Scofield 
andCrouch, 1987 ; Josefsson et al., 1987 ; and work 
described in the present application) . 2S albumins are 

2 q formed in the cell from two sub-units of 6-9 and 3-4 
kilodaltons (kd) respectively, which are linked by 
disulfide bridges. 

The work in the references above showed that 2S 
albumins are synthesized as complex prepropeptide whose 

25 organization is shared between the 2S albumins of many 
different species and are shown diagramatically for three 
of these species in figure 2. Several complete sequences 
are shown in figure 2. 

As to fig. 2 relative to protein sequences of 2S 

3Q albumins, the following observations are made. For B. 
na£us, B. excelsia. and A. thaliana both the protein and 
DNA sequences have been determined . For JL_ communis only 
the protein sequence is available (B. napus from Crouch et 
al., 1983 and Ericson et al., 1986 ; B. excelsia from Ampe 

35 et al., 1986, de Castro et al . , 1987 and Altenbach et al . , 
1987, R. communis from Sharief et al., 1982). Boxes 
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indicate homologies, and raised dots the position of the 
cysteines . 

Comparison of the protein sequences at the begin- 
ning of the precursor with standard consensus sequences 
5 for signal peptides reveals that the precursor has not one 
but two segments at the amino terminus which are not 
present in the mature protein, the first of which is a 
signal sequence (Perlman and Halvorson, 1983) and the 
second of which has been designated the amino terminal 

1Q processed fragment (the so called ATPF) . Signal sequences 
serve to ensure the cotranslational transport of the 
nascent polypeptide across the membrane of the endoplasmic 
reticulum (Blobel, 1980), and are found in many types of 
proteins, including all seed storage proteins examined to 

15 date (Herman et al., 1986). This is crucial for the 
appropriate compartmentalization of the protein. The 
protein is further folded in such a way that correct 
disulfide bridges are formed. This process is probably 
localized at the luminal site of the endoplasmatic 

20 reticulum membrane, where the enzyme disulfide isomerase 
is localized (Roden et am., 1982; Bergman and Kuehl, 
1979). After translocation across the endoplasmic 
reticulum membrane it is thought that most storage 
proteins are transported via said endoplasmic reticulum to 

2 5 the Golgi bodies, and from the latter in small membrane 
bound vesicles ("dense vesicles") to the protein bodies 
(Chrispeels, 1983; Craig and Goodchild, 1984 ; Lord, 
1985). That the signal peptide is removed cotransla- 
tionally implies that the signals directing the further 

2Q transport of seed storage proteins to the protein bodies 
must reside in the remainder of the protein sequence 
present . 

2S albumins contain sequences at the amino end of 
the precursor other than the signal sequence which are not 
present in the mature polypeptide. This is not general to 
all storage proteins. This amino terminal processed frag- 
ment is labeled Pro in Fig.1 and ATPF in figure 1A. 
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In addition, as shown in figure 1 and 1A, several 
amino acids located between the small and large sub-units 
in the precursor are removed (labeled link in Fig.1 and 
IPF in figure1A f which stands for internal processed 
^ fragment) . Furthermore, several residues are removed from 
the carboxyl end of the precursor (labeled Tail in Fig. 6c 
and CTPF in figure 1A, which stands for carboxyl terminal 
processed fragment) . The cellular location of these latter 
process steps is uncertain, but is most likely the protein 

10 bodies (Chrispeels 1983 ; Lord, 1985). As a result of 
these processing steps the small sub-unit (Sml. Sub) and 
large sub-unit remain. These are linked by disulfide 
bridges, as discussed below. 

When the protein sequences of 2S-albumins of 

15 different plants are compared strong structural similari- 
ties are observed. This is more particularly illustrated 
by figure 2 and 2A r which provide the aminoacid sequences 
of^ the small sub-unit and large sub-unit respectively of 
representative 2S storage seed albumin proteins of 

2Q different plants, i.e. : 

R. comm. : Ricinus communis 

A. thali. : Arabidopsis thaliana 

B. napus : Brassica napus 

B. excel. : Bertholletia excelsa (Brazil nut) 
25 xt must be noted that in fig. 2 and .2A 

- the aminoacid sequences of said sub-units extend on 
several lines ; 

- the cysteine groups of the aminoacid sequences of the 
examplified storage proteins and identical aminoacids in 

30 sev ^ral of said proteins have been brought into vertical 
alignment ; the hyphen signs which appear in some of these 
sequences represent absent aminoacids, in other words 
direct linkages between the closest aminoacids which 
surrounded them ; 

35 " the aminoacid sequences which in the different proteins 
are substantially conserved are framed. 
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It will be observed that all the sequences contain 
eight cysteine residues (the first and second ones in the 
small sub-unit, the remainder in the large sub-unit) which 
can participate in disulfide bridges as diagrammatically 
5 shown in fig. 3, which represents a hypothetical model 
(for the purpose of the present discussion) rather than a 
representation of the true structure proven by experimen- 
tation of the 2S-albumin of Arabidopsis thaliana. Said 
hypothetical model has been inspired by the disulfide 
10 bridge mediated loop-formation of animal albumins, such as 
serum albumins (Brown, 1976), alpha-f etoprotein 
(Jagodzinski et al., 1987; Morinaga et al.; 1983) and the 
vitamine D binding protein where analogous constant C-C 
doublets and C-X-C triplets were observed (Yang et al . , 

15 1 9 Q5 >- 

Furthermore, the distances between the cysteine 
residues are substantially conserved within each sub-unit, 
with the exception of the distance between the sixth and 
seventh cysteine residues in the large sub-unit. This 

20 suggests that these arrangements are structurally 
important, but that some variation is permissible in the 
large sub-unit between. said sixth and seventh cysteines. 

The invention is based on the determination of the 
regions of the storage protein which can be modified 

25 without an attendant alteration of the properties and 
correct processing of said modified storage protein in 
plant seeds of transgenic plants. This region (diagramma- 
tically shown in fig. 3 by an enlarged hatched portion) 
will in the examples hereafter referred to be termed as 

30 the "hypervariable region". Fig. 3 also shows the res- 
pective positions of the other parts of the precursor 
sequence, including the " IPF" section separating the small 
sub-unit and large sub-unit of the precursor, as well as 
the number of aminoacids (aa) in substantially conserved 

35 portions of the protein sub-units cystein residues. The 
processing cleavage sites are shown by symbols^. 
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The seeds of many plants contain albumins of 
approximately the same size as the storage proteins 
discussed above. However, for ease of language the term 
"2S albumins" will be used herein to refer to seed 
5 proteins whose genes encode a peptide precursor with the 
general organization shown in figure 1 and which are 
processed to a final form consisting of two subunits 
linked by disulfide bridges. This is not to be construed 
as indicating that the process described below is 
exclusively applicable to such 2S albumins. 

The process of the invention for producing a 
determined polypeptide of interest comprises : 
- cultivating plants obtained from regenerated plant cells 
or from seeds of plants obtained from said regenerated 

15 Plant cells over one or several generations, wherein the 
genetic patrimony or information of said plant cells, re- 
plicable within said plants, includes a nucleic acid se- 
quence, placed under the control of a seed-specific pro- 
moter, which can be transcribed into the mRNA encoding at 

20 least part of the precursor of a storage protein including 
the signal peptide of said plant, said nucleic acid being 
hereafter referred to as the "precursor encoding nucleic 
acid" 

wherein said nucleic acid contains a nucleotide 
25 sequence (hereafter termed the "relevant sequen- 

ce"), which relevant sequence comprises a non es- 
sential region modified by a heterologous nucleic 
acid insert forming an open reading frame in 
reading phase with the non modified parts sur- 
30 rounding said insert in said relevant sequence. 

. wherein said insert includes a nucleotide segment 
encoding said polypeptide of interest. 

wherein said heterologous nucleotide segment is 
linked to the adjacent extremities of the 
35 surrounding non modified parts of said relevant 

sequence by one or several codons whose nucleotides 
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belong either to said insert or to the adjacent 
extremities or to both, 

wherein said one or several codons encode one or 
several aminoacid residues which define selectively 
cleavable border sites surrounding the peptide of 
interest in the hybrid storage protein or storage 
protein sub-unit encoded by the modified relevant 
sequence ; 

- recovering the seeds of the cultivated plants and 
extracting the hybrid storage proteins contained 
therein, 

cleaving out the peptide of interest from said hybrid 
storage protein at the level of said cleavage sites; and 

- recovering the peptide of interest in a purified form. 
It will be appreciated that under the above- 
mentioned conditions each and every cell of the cultivated 
plant will include the modified nucleic acid. Yet the 
above defined recombinant or hybrid sequence will be 
expressed at high levels only or mostly in the seed 
forming stage of the cultivated plants and, accordingly, 
the hybrid protein produced mostly in the seeds. 

It will be understood that the "heterologous 
nucleic acid insert" defined above consists of an insert 
which contains nucleotide sequences which at least in 
part, are foreign to the natural nucleic acid encoding the 
precursor of the storage protein of the seeds or plant 
cells concerned. Most generally the segment encoding the 
polypeptide of interest will itself be foreign to the 
natural nucleic acid encoding the precursor of said 
storage protein. Nonetheless, the term "heterologous 
nucleic acid insert" does also extend to an insert 
containing a segment as above-defined normally present in 
the genetic, patrimony or information of said seeds or 
plant cells, the "heterologous" character of said insert 
35 then adressing to the one or several codons which surround 
it, on both sides thereof and which link said segment to 
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the non-modified parts of the nucleic acid encoding said 
precursor. Under such last mentioned circumstances the 
invention thus provides for a method which enables the 
production and easy separation and recovery of a valuable 
5 protein normally produced in the plant itself, either at 
the seed forming stage or at any other stage of the 
development of the plant, and either in the protein bodies 
of the seeds or any other location of said plant cells. 

The "polypeptide of interest" will usually consist 

1Q of a single polypeptide, or protein which, when cleaved 
out from the hybrid storage proteins in the final stages 
of the process of this invention, will retain or resume at 
least those of the biological properties sought to be 
possessed by that single polypeptide or protein of 

15 interest- By way of non limitative examples of properties 
sought to be retained by the polypeptide of interest, one 
may cite, e.g. enzymatic or therapeutic activities, the 
capability of being recognized by determined antibodies, 
immunogenic properties, for instance the capability of 

2Q eliciting in a living host antibodies which are able to 
neutralize such peptide of interest or a pathogenic agent 
containing antigens including the same or an analogous 
sequence of aminoacids as said "polypeptide of interest". 

However the "polypeptide of interest" may also 

25 comprise repeats of a unit, particularly of an individual 
peptide or polypeptide having any desired biological 
activity, said units being joined with one another over or 
through cleavable sites permitting the separations of the 
biologically repeats or units from one another. Though not 

30 decisive, such cleavable sites are advantageously 
identical to or sensitive to the same cleaving means, e.g. 
a determined restriction enzyme as the above-defined 
"border cleavage sites" which enable the overall 
"polypeptide of interest" to be cleaved out from the 

35 hybrid storage protein. As a matter of fact, separation of 
the active units from one another may then be achieved 
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simultaneously with the above mentioned "cleaving out" 
operations. Yet the different units or repeats may be 
joined through different cleavage sites, whereby the 
separation of said units from one another may be 
undertaken subsequent to the "cleaving out" operations of 
said "polypeptide of interest" from the hybrid storage 
protein . 

The number of repetitive units in the polypeptide 
of interest will of course be dependent upon the maximum 
length of polypeptide of interest which may be 
incorporated in the storage protein concerned under the 
conditions defined herein. 

In the preceding definition of the process 
according to the invention the so-called "non-essential 
region" of the relevant sequence of said nucleic acid 
encoding the precursor, consists of a region whose 
nucleotide sequence can be modified either by insertion 
into it of the above defined insert or by replacement of 
at least part of said non-essential region by said insert, 
yet without modifying the' resulting overall configuration 
of said hybrid storage protein as compared to that of the 
non-modified natural storage protein as well as the 
transport of the correspondingly modified nascent hybrid 
storage protein into the abovesaid protein bodies. 

In the present invention the precursor-coding 
nucleic acid referred to above may of course originate 
from the same plant species as that which is cultivated 
for the purpose of the invention. It may however originate 
from another plant species, in* line with the teachings of 
Beachey et al . , 1985 and Okamuro et al . , 1987 already of 
record. 

In a similar manner the seed-specific promoter may 
originate from the same plant species or from another, 
subject in the last instance to the capability of the host 
plant's polymerases to recognize it. 

Any method for the location of a non-essential 
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region in a storage protein can be used. Once this re- 
gion is defined at the protein sequence level, the cor- 
responding region of the precursor encoding nucleic acid 
can be altered. For instance, non-essential regions can 
5 be located using methods based on. the establishment of 
secondary and tertiary protein structures by molecular 
modeling. Such models will allow the identification of 
regions of the protein critical for its configuration or 
interaction in higher order aggregations. In the absence 

10 of such technology, the peptide sequences of analogous 
proteins from various plant species can be compared. 
Those subsequences which said peptide sequences have in 
common (and which prima-facie will support the 
presumption that they cannot be modified without affec- 

15 ting the structure, processing, intracellular passage, 
or packaging of the peptide in a deleterious way) can be 
distinguished from those which are so different from one 
another as to support the assumption that they may 
consist of "non-essential regions" which may then be 

20 deemed to be eligible for modification by a determined 
heterologous insert. 

Such an approach is possible when the protein or 
nucleic acid sequences of several similar storage pro- 
teins originating from different plants have been deter- 

25 mined (as is the case for the 2S albumins). A suitable 
method then comprises identifying said nucleic acid 
regions which encode peptide regions undergoing varia- 
bility in either amino acid sequence or length or both, 
as compared with the regions which, on the contrary, do 

30 exhibit substantial conservation of amino acid sequence 
between said several plant species. Where the storage 
proteins under study contain cysteine residues and where 
further it is thought or known through experimental data 
that said cysteines participate in disulfide bridges 

35 likely to play an important part in the establishment of 
the structure and conformation of the storage proteins 
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concerned, the method should be extended to take this 
into account. In this case, the cysteine residues should 
not be among those residues altered by the modification 
of the storage protein, and where sequence comparison of 
5 protein sequences of analogous proteins shows that the 
distance (in amino acid residues) between cysteines is 
conserved, this distance should not be altered by any 
subsequent modification. The said non-essential regions 
in the protein sequence so selected can then be modified 

<jq by insertion into the corresponding region of the 
precursor-coding nucleic acid, the nucleic acid segments 
encoding the desired peptide product and, after said 
modification has been achieved, the expression of the 
modified storage protein in the seeds recoverable at the 

15 seed-forming stage of plant development can be assayed. 

Another method which is available within the skills 
of a person skilled in the art to determine if a region 
thought to be amenable to modification- consists in to 
make such a modification and to express the chimeric 

20 gene in any one of several expression systems which, 
while not ppropriate to produce economically interesting 
amounts of the chimeric protein, will, if the chimeric 
protein is stable, produce small quantities for 
analysis. In such experiments, the unmodified protein 

25 should also be brought to expression as a control. Such 
systems include, but re not limited to, the Xenopus 
leaves oocytes (Bassener et al., 1983), transient 
expression in plant chloroplats (Fromm et al . , 1985), 
yeast (Hollenberg et al, 1985), plant callus and the 

30 Acetabularia system. The latter hs been used byBrown et 
al (1986) for the functional analysis of zein genes and 
their modification by sequences encoding lysine. 

The choice of precursor-coding nucleic acids 
encoding the precursors of 2S-proteins, particular 

35 water-soluble 2S-proteins for the production of the 
modified nucleic acids to be transferred into the plant 
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cells to be modified is particularly attractive for the 
reasons already of record. 

As can be seen on figs 2 and 2A, the regions which 
are intercalated between the first and second cysteines 
5 in the small sub-unit of the protein, between the fifth 
and sixth cysteines, on the one hand, and between the 
seventh and eighth cysteines in the large sub-unit of 
the protein show a substantial degree of conservation or 
similarity. It would thus seem that these regions are in 
1Q some way essential for the proper folding and/or stabi- 
lity of the the protein when synthesized in the plant 
seeds . 

To the contrary other regions such as at the end of 
the small subunit, at the beginning or end of the large 

15 sub-unit r show differences of such a magnitude that they 
can be held as presumably having no substantial impact 
on the final properties of the protein. A region which 
does not seem essential, consists of the middle position 
of the region located in the large sub-unit, between the 

20 sixth and the seventh cysteine of the mature protein. As 
visible on the drawing (Fig. 2.) B napus comprises a CKQQM 
sequence between the Q aminoacid which precedes it and 
the V aminoacid which follows it, whereas at the same 
level A- thali has no similar sequence at all between 

25 the same seighbouring aminoacids and B. excel and R . comm 
comprise shorter CEQ and CQ peptides respectively. 
Thus it appears that in addition to the absence of 
similarity at the level of the aminoacid residues, there 
appears a difference in length which makes that region 

30 eligible for substitutions in the longest 2S albumins 
and for addition of aminoacids in the shortest 2S 
albumins or for elongation of both. 

The same observations should extend at the level of 
approximately of the end of the first third part of the 

35 same region between said sixth and seventh cysteine: see 
sequence of JL_ communis which is much shorter in that 
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region than the corresponding regions of the other 
examplified 2S-proteins. 

Experimentation, which is within the skills of the 
person skilled in the art, will show how much of the 
other aminoacids which neighbour the abovesaid sixth and 
seventh cysteine of the mature protein could further be 
substituted without causing disturbance of the stability 
and correct processing of the hybrid protein. For 
instance experimentation will show how much of the other 
aminoacids which neighbour the abovesaid GKQQM sequence 
of IL- napus upstream and downstream thereof, could 
further be substituted without causing the hybrid 
protein likely to be formed to be further substituted 
without loss by the hybrid protein of the essential 
properties of the normal JL. naous 2S albummin. The 
modifications contemplated should preferably not affect 
the three, preferably six aminoacids adjacent to the 
relevant cysteins, e.g. the sixth and seven cysteins of 
the 25-mature protein. 

It is of course realized that caution must be 
exercized against hypotheses based on arbitrary choices 
as concerns the bringing into line of similar parts of 
proteins which elsewhere exhibit substantial 
differences. Nevertheless such comparisons have proven 
2 5 in other domains of genetics to provide the man skilled 
in the art with appropriate guidance to reasonably infer 
from local structural differences, on the one hand, and 
from local similarities, on the other hand, in similar 
proteins of different sources, which parts of such 
proteins can be modified and which parts cannot, when it 
is sought to preserve some basic properties of the non 
modified protein in the same protein yet locally 
modified by a foreign or heterologous sequence. 

Thus it is prima facie deemed that, subject to 
verification, any part of a protein or of a subunit 
thereof may be deemed as eligible for substitution by a 
peptide having a different aminoacid sequence. 
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The choice of the adequate non-essential regions to 
be used in the process of the invention will also depend 
on the length of the peptide of interest. Basically the 
method of the invention thus allows the production of 
5 biologically active polypeptides in the range of 3-100 
aminoacids in length. This biologically active 
polypeptide may have a vegetal origin or may be a non 
plant variety specific polypeptide having a bacterial 
origin or a fungal origin or an algal origin or an 
10 invertebral origin or a vertebral origin such as a mama- 
lian origin. 

The sequence (insert) to be inserted in the ap- 
propriate regions of the relevant sequence storage 
protein, e.g. a 2S protein, or a sub-unit thereof, does 

15 not, normally, include only the segment coding this 
polypeptide of interest, but also the codons (or parts 
thereof when the contiguous nucleotides of the non- 
modified parts of the relevant nucleotide sequences of 
the precursor-coding nucleic acid happen to adequately 

20 supplement the codons) encoding aminoacids or peptides 
which form the abovesaid aminoacid junctions cleavable, 
e.g., by protease or chemical treatment, so that the 
peptide of interest can later be recovered from the 
purified 2S protein. The junction-sequences can be made 

25 either as a double stranded oligomer or, if part of a 
gene is available, as a restriction fragment, but in the 
latter case the cleavage sites, e.g. protease cleavage 
sites must generally be added. 

The choice of sequences bordering the peptide of 

3Q interest depends on several factors which essentially ® 
depend on the techniques to be used for purifying that 
peptide in the final stages of the process. The peptide * 
of interest can be flanked by any proteolytic cleavage 
sites, provided that the sequence of the peptide of 

35 interest does not contain internal similar cleavage 
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sites. Finally, the proteases and/or chemical cleavage 
reagents should be specific and readily available. They 
should correctly cleave the inserted sequence at both 
the amino and carboxyl termini. For example, the 
5 protease trypsin cleaves after Arginine or Lysine 
residues assuming they are not followed by a Proline. 
Thus if neither Arginine of Lysine residues are present 
in the peptide of interest (or are followed by a 
Proline) the sequence can be flanked by codons encoding 

IQ one of those two amino acids . The peptide can then be 
cleaved out of the hybrid protein using trypsin, 
followed by treatment with the exoprotease 
carboxypeptidase B to remove the extra carboxyl terminus 
Arg or Lys . Similarly, the protease endo-Lys-C (Jekel et 

15 al. # 1983) cleaves after Lysine residues, so that a 
peptide could be inserted between two such residues, 
cleaved from the 2S albumin using this protease, and the 
extra Lysine again removed using carboxypeptidase B. 
Such a strategy is particularly useful when the 2S 

20 albumin is used, as the latter is poor in Lysine, so 
that only a few fragments are generated, resulting in 
easy purification. Cyanogen bromide serves as an example 
of a chemical cleavage reagent. Treatment with this 
reagent cleaves on the carboxyl side of Methionine. 

25 Thus, for each case a separate strategy must be 
developed, but the wide variety of protease cleavage 
techniques available allows the same basic principles to 
be followed. As often as possible, strategies should use 
economical commercially available proteases or reagents, 

30 and purification steps limited in number. For reviews of 
various enzymatic and chemical cleavage techniques see 
volumes 19 (1970 and 47 (1977) of Methods in Enzymology. 

Finally, some peptides are found in nature with 
C-terminal alpha-amide structures (alpha-melanotropin, 

35 calcitonin, and others ; see Hunt and Dayhoff, 1976). 
This post-translational modification has been shown to 
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be of essential importance for the biological activity 
of the peptide. Such a C-terminally amidated peptide can 
be obtained by transformation of a C-terminal glycine 
residue into an amide group (Seiringer et al., 1985). 
5 Therefore such peptides can be generated from the 2S 
hybrid protein by adding a C-terminal glycine residue to 
the peptide which, after purification, is transformed 
into an amide group. 

When the complete protein sequence of the region to 

10 be inserted into the storage protein has been determi- 
ned, including both the polypeptide of interest and the 
aminoacids of peptides which form the above described 
cleavable junctions, the nucleotide sequence to encode 
said protein sequence must be determined. It will be 

15 recognized that while perhaps not absolutely necessary 
the codon usage of the encoding nucleic acid should 
where possible be similar to that of the gene being mo- 
dified. The person skilled in the art will have access 
to appropriate computer analysis tools to determine said 

2q codon usage. 

Any appropriate genetic engineering technique may 
be used for substituting the insert for part of the 
selected precursor-coding nucleic acid, or for inserting 
it in the appropriate region of said precursor-coding 

25 nucleic acid. The general in vitro recombination 
techniques followed by cloning in bacteria can be used 
for making the chimeric genes. Site-directed mutagenesis 
can be used for the same purposes as further examplified 
hereafter. DNA recombinants, e.g. plasmids suitable for 

30 the transformation of plant cells can also be produced 
according to techniques disclosed in current technical 
literature. The same applies finally to the production 
of transformed plant cells in which the hybrid storage 
protein encoded by the relevant parts of the selected 

35 Precursor-coding nucleic acid can be expressed. By way 
of example, reference can be made to the published 
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European applications nr. 116 718 or to International 
application WO 84/02913 (incorporated herein by 
reference) and , which disclose appropriate techniques 
to that effect. 

g The preceding discussion has been based more 

specifically, by way of example, on the modification of 
storage 2S albumin. It will be understood that the 
process of this invention can also be carried out upon 
using any other type of 2S-storage protein or any other 
10 storage protein having another sedimentation 
coefficient, (e.g. a 7S-, 11S- and -12S storage protein) 
or the same, provided that the DNA sequences which 
encode it in the plant from which it can be isolated, 
have been or can be identified and that non-essential or 
5 "hypervariable subsequences" therein have been or can be 
detected. 

Examples (by way of illutration only) of such other 
storage proteins consist (see also Higgins (1984) for 
review) : 

2 q - of other albumins, which are water soluble storage 
proteins, which may be either 12S like such as the 
lectins isolatable from pea and various beans, or either 
2S like such as the 2S albumins already or record or 
other 2S albumins isolatable from pea, radish and 

25 sunflower ; 

of globulins, which are storage proteins soluble in 
salt solutions, which may be either 7-8S like such as 
the phaseolins isolatable from Phaseolus , the vicilins 
isolatable from pea, the conglycinins isolatable from 

3 q soybean, the oat-vicilins isolatable from bat, or either 
11-14S like, such as the legumins isolatable from pea, 
the glycinins isolatable from soy-bean, the helianthins 
isolatable from sunflower or other 11-14S globulins 
isolatable from beans, Arabidopsis . and probably from 

35 wheat ' 

of prolamins, which are alcohol soluble storage 
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proteins, such as the zeins isolatable from corn, the 
hordiens isolatable from barley, the gliadins isolatable 
from wheat and the kafirins isolatable from sorghum ; 
- of glutelins, which are storage proteins soluble under 
5 low pH conditions and isolatable from wheat. 

Some of these storage proteins-merely cited by way 
of examples- are poor in cysteines. Yet the different 
proteins of a same group do show variable regions on the 
one hand, better conserved regions on the other hand. 

10 Needless to say that these storage proteins could 

be used as suitable vectors for the production of the 
abovesaid hybrid proteins and their respective 
purifications from the seed proteins, upon relying on 
their respective specific solubility characteristics in 

15 the corresponding solvents. 

The procedures which have been disclosed generally 
hereabove apply to the adequate modification of the 
non-essential regions of any of said other storage 
proteins by an heterologous insert containing a DNA 

20 sequence encoding the peptide of interest and then to 
the transformation of the relevant plants with the 
chimeric gene obtained for the production of a hybrid 
protein containing the sequence of the peptide of 
interest in the seeds of the relevant plant, and they 

25 apply to the recovery of the peptide of interest from 
said plants. Needless to say that the person skilled in 
the art will in. all instances be able of selecting which 
of the existing techniques would at best fulfill its 
needs at the level of each step of the production of 

30 such modified plants, to achieve the best production * 
yields of said peptide of interest. 

The preceding discussion has been based more * 
specifically, by way of example, on the modification of 
the hypervariable region of a determined storage protein 

35 by an insert encoding a biologically ctive peptide. It 
will be understood that the person skilled in art may 
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choose as insert a sequence which encode repeats of said 
biologically active peptide, wherein every sequence 
encoding said biologically active peptide is separated 
from the other by border sequences encoding selective 
cleavage sites which allow their separation during 
purification. 

For instance the following process can be used in 
order to exploit the capacity of a storage protein, to 
be used as a suitable vector for the production in seeds 
of a determined polypeptide of interest or repeats 
thereof, when the corresponding precursor-coding nucleic 
acid has been sequenced. Such process then comprises: 

1 ) locating and selecting one of said relevant 
sequences of the precursor-coding nucleic acid which 
comprises a non-essential region encoding a peptide 
sequence which can be modified by substituting an insert 
for part of it or by inserting of said insert into it, 
which modification is compatible with the conservation 
of the configuration of the storage protein; 

2) inserting a nucleic acid insert in the selected 
region of said precursor nucleic acid in appropriate 
reading frame relationship with the non-modified parts 
of said relevant sequence, which insert includes a 
determined segment encoding the polypeptide of interest 
or repeats thereof and, downstream and upstream of said 
determined segment, suitable nucleotides, codons or 
triplets of nucleotides which, after said insertion into 
the precursor-coding nucleic acid has been achieved, 
participate in the formation of codons encoding 

30 aminoacid junctions linking the polypeptide of interest 
or its individual, repeats to each other and into the 
relevant parts of the storage protein or sub-unit 
thereof, whereby said amino-acid junctions define border 
sites surrounding the peptide of interest and which can 
themselves be selectively cleaved, e.g. by specific 
peptidases ; 
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3) inserting the modified precursor-coding nucleic 
acid obtained in a plasmid suitable for the 
transformation of plant cells which can be regenerated 
into full seed-forming plants, wherein said insertion is 
brought. under the control of regulation elements, 
particularly a seed specific promoter capable of 
providing for the expression in the seeds of said plants 
of the open-reading frames associated therewith; 

4) transforming a culture of such plant cells with 
such modified plasmid; 

5) assaying the expression of the chimeric storage 
protein having inserted into its hyperviariable region 
the determined sequence of the segment encoding the 
polypeptide of interest or the repeats thereof and f when 
achieved 

6) regenerating said plants from the transformed 
plant cells obtained and growing said plants up to the 
seed forming stage; 

7) recovering the seeds and extracting the storage 
20 proteins contained therein; 

8) cleaving said storage proteins e.g. with said 
specific peptidases, isolating and recovering the 
peptide of interest. 

In the case of storage 2S-proteins which contain a 
25 substantial number of cysteine residues, which storage 
proteins are preferred at the present time, and further 
when the precursor-coding nucleic acids of several 
similar proteins performing the same functions in 
different plants,, yet originating from said different 
30 plants respectively, are available and have been (or can 
be) sequenced, step 1) of the general process defined 
above may be carried out as follows (it being understood 
that the sequence of steps recited hereafter is optional 
and can be replaced by any other procedure aiming at 
35 ach i*ving the same result). Said "step 1" then 
comprises : 



WO 89/03887 



PCT/EP88/00944 



10 



15 



23 

a) selecting several of said plant storage pro- 
teins, available and identifiable in several seed 
forming plant species respectively; 

b) locating the precursor-coding nucleic acid 
sequence which in each of said plant species encodes the 
precursor of said plant storage protein and determining 
in said precursor-coding nucleic acid a relevant 
nucleotide sequence consisting of a sequence encoding 
the mature storage protein or an appropriate 
sub-sequence encoding for a sub-unit of said mature 
storage protein; 

c) determining the relative positions of the codons 
which encode the successive cysteine residues in said 
mature protein or protein sub-units and identifying 
the corresponding successive nucleic acid regions 
located upstream of, between, and downstream of said 
codons within said sub-sequences of the precursor-coding 
nucleic acid and identifying in said successive regions 
those parts which undergo variability in either 
aminoacid sequence or length or both from one plant 
species to another as compared with those other regions 
which do exhibit substantial conservation of aminoacid 
sequence in said several plant species, one of said 
nucleotide regions being then selected for the insertion 
therein of the nucleic acid insert including the segment 
encoding the peptide of interest or repeats thereof, 
e.g. as disclosed under 2) hereabove . 

Hence last mentioned enbodiment of the invention 
provides that in having the heterologous polypeptide of 
interest or repeats thereof made as part of a hybrid 
protein in a plant, it will pass the plant protein 
disulfide isomerase during membrane translocation, thus 
increasing the chances that the correct disulfide 
bridges be formed in the hybrid precursor as in its 
35 normal precursor situation, on the one hand, and that 
the polypeptide of interest or repeats thereof be 



20 



25 



30 



WO 89/03887 



PCT/EP88/00944 



24 

protected against the different drawbacks which have 
been recalled above as concerns the standard genetic 
engineering techniques for producing foreign peptides in 
host microorganisms, on the other hand. 
5 The invention further refers to the recombinant 

nucleic acids themselves for use in the process of the 
invention; particularly to the 

recombinant precursor encoding nucleic acid 
defined in the frame of said process; 
10 " recombinant nucleic acids containing said 

modified precursor -coding nucleic acid under the 
control of a seed-specific promoter, whether the 
latter originates from the same DNA as that of 
said precursor-coding nucleic acid of from a DNA 
15 of another plant, 

- vectors, more particularly plant plasmids e.g., 
Ti-derived plasmids modified by any of the preced- 
ing recombinant nucleic acids for use in the 
transformation of the above plant cells. 
20 The chimeric gene should be provided with a suita- 

ble signal sequence if it does not posses one (which all 
storage proteins do). 

The invention also relates to the regenerable sour- 
ce of a polypeptide of interest, which is formed of 
25 either plant cells of a seed-forming-plant, which plant 
cells are capable of being regenerated into the full 
plant or seeds of said seed-forming plants wherein said 
plants or seeds have been obtained as a result of one or 
several generations of the plants resulting from the 
30 regeneration of said plant cells, wherein further the 
DNA supporting the genetic information of said plant 
cells or seeds comprises a nucleic acid or part thereof, 
including the sequences encoding the signal peptide, 
which can be transcribed in the mRNA corresponding to 
35 the precursor of a storage protein of said plant, placed 
under the control of a seed specific promoter, and 
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wherein said nucleic acid sequence contains a 
relevant modified sequence encoding the mature 
storage protein or one of the several sub-sequen- 
ces encoding for the corresponding one or several 
^ sub-units of said mature storage protein, 

wherein further the modification of said 
relevant sequence takes place in one of its non 
essential regions and consists of a heterologous 
nucleic acid insert forming an open-reading frame 
1Q in reading phase with non modified parts which 

surround said insert in the relevant sequence, 

wherein said insert includes a nucleotide 
segment encoding said polypeptide of interest, 

wherein said heterologous nucleotide segment is 
4 _ linked to the adjacent extremities of the 

surrounding non modified parts of said relevant 
sequence by one or several codons whose 
nucleotides belong either to said insert or or to 
the adjacent extremities or to both, 
2Q . wherein said one or several codons encode one or 

several aminoacid residues which define 
selectively cleavable border sites surrounding the 
peptide of interest in the hybrid storage protein 
or storage protein sub-unit encoded by the modi- 
25 fied relevant sequence ; 

It is to be considered that although the invention 
should' not be deemed as being limited thereto, the nucleic 
inserts encoding the polypeptide of interests or repeats 
thereof will in most instances be man-made synthetic 
2Q oligonucleotides or oligonucleotides derived from viral or 
bacterial genes or of from cDNAs derived of viral or 
bacterial RNAs, or further from non-plant eucaryotic 
genes, all of which shall normally escape any possibility 
of being inserted at the appropriate places of the plant 
35 cells or seeds of this invention through biological 
processes, whatever the nature thereof. In other words, 
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these inserts are usually °non plant variety specific", 
specially in that they can be inserted in different kinds 
of plants which are genetically totally unrelated and thus 
incapable of exchanging any genetic material by standard 
5 biological processes, including natural hybridization 
processes. 

Thus the invention further relates to the seed 
forming plants themselves which have been obtained from 
said transformed plant cells or seeds, which plants are 

10 characterized in that they carry said hybrid 
precursor-coding nucleic acids associated with a seed 
promoter in their cells, said inserts however being 
expressed and the corresponding hybrid protein produced 
mostly in the seeds of said plants. 

15 There follows an outline of a preferred method 

which can be used for the modification of 2S seed storage 
protein genes, their expression in transgenic plants, the 
purification of the 2S storage protein, and the recovery 
of the biologically active peptide of interest. The 

20 outline of the method given here is followed by a specific 
example. It will be understood from the person skilled in 
the art that the method can be suitably adapted for the 
modification of other 2S seed storage protein genes. 
1 . Replacement or supplementation of the hypervariable 

25 region of the 2S storage protein gene by the sequence 

of interest. 

Either the cDNA or the genomic clone of the 2S 
albumin can be used. Comparison of the sequences of the 
hypervariable regions of the genes* in figure 2 shows that 

30 they vary in length. Therefore if the sequence of interest 
is short and a 2S albumin with a relatively short hyper- 
variable region is used, the sequence of interest can be 
inserted. Otherwise part of the hypervariable region is 
removed, to be replaced by the insert containing the 

35 segment or sequence of interest and, if appropriate, the 
border codons . The resulting hybrid storage protein may be 
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longer or shorter than the non-modified natural storage 
protein which has been modified. In either case two 
standard techniques can be applied ; convenient 
restriction sites can be exploited, or mutagenesis vectors 
5 (e.g. Stanssens et al . 1987) can be used. In both cases, 
care must be taken to maintain the reading frame of the 
message . 

2 . The altered 2S albumin coding region is placed under 
the control of a seed specific gene promoter. 

10 A seed specific promoter is used in order to 

ensure subsequent expression in the seeds only. This 
facilitates recovery of the desired product and avoids 
possible stresses on other parts of the plant. In 
principle the promoter of the modified 2S albumin can be 

15 used. But this is not necessary. Any other promoter 
serving the same purpose can be used. The promoter may be 
chpsen according to its level of efficiency in the plant 
species to be transformed. In the examples below a lectin 
promotor from soybean and a 2S albumin promoter from 

2Q Arabidopsis are used. If a chimeric gene is so cons- 
tructed, a signal peptide encoding region must also be 
included, either from the modified gene or from the gene 
whose promotor is being used. The actual construction of 
the chimeric gene is done using standard molecular bio- 

25 logical techniques (see example). 

3. The chimeric gene construction is transferred into the 
appropriate host plant. 

When the chimeric or modified gene construction is 
complete it is transferred in its entirety to a plant 

3Q transformation vector. A wide variety of these, based on 
disarmed (non-oncogenic) Ti-plasmids derived from Agrobac- 
terium tumef aciens , are available, both of the binary and 
cointegration forms (De Blaere et al . , 1987). A vector, 
including a selectable marker for transformation, usually 

35 antibiotic resistance, should be chosen. Similarly, the 
methods of plant transformation are also numerous, and are 
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fitted to the individual plant. Most are based on either 
protoplast transformation (Marton et al., 1979) or trans- 
formation of a small piece of tissue from the adult plant 
(Horsch et al., 1985). In the example below, the vector is 
5 a binary disarmed Ti-plasmid vector, the marker is 
kanamycin resistance, and the leaf disc method of trans- 
formation is used. 

Calli from the transformation procedure are 
selected on the basis of the selectable marker and 

10 regenerated to adult plants by appropriate hormone 
induction. This again varies with the plant species being 
used. Regenerated plants are then used to set up a stable 
line from which seeds can be harvested* 
4. Recovery of biologically active polypeptides. 

15 The purification of 2S plant albumins is well 

established (Youle and Huang, 1981 ; Ampe et al., 1986). 
It is a major protein in mature seeds and highly soluble 
in aqueous buffers. A typical purification of 2S-storage 
proteins involves the following steps : 1 , homogenization 

20 of seed in dry ice and extraction with hexane ; 2, extrac- 
tion with high salt buffer and dialysis against distilled 
water, precipitating the contaminating globulins ; 3, 
further purification of the water soluble fraction by 
gel-filtration chromatography, which separates the smaller 

25 2S-storage proteins from the larger contaminants ; and 4, 
final purification by ion-exchange chromatography. The 
exact methods used are not critical to the technique 
described here, and a wide range of classical techniques, 
including gel filtration, ion exchange and reversed phase 

30 chromatography, and affinity or immunoaf f inity chromato- 
graphy may be applied both to purify the chimeric 2S 
albumin and, after it is cleaved from the albumin, the 
biologically active peptide. The exact techniques used for 
this cleavage will be determined by the strategy decided 

35 upon at the time of the design of the flanking sequences 
(see above) . As 2S albumins are somewhat resistant to 
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proteases, denaturation steps should often be included 
before protease treatment (see example). 
5. Assays for biologically active peptides. 

Assays for the recovered product are clearly 
5 dependent on the product itself. For initial screening of 
plants, immunological assays can be used to detect the 
presence of the peptide of interest. Antibodies against 
the desired product will often function even while it is 
still part of the hybrid 2S protein. If not, it must be 

-jq partially or completely liberated from the hybrid, after 
which peptide mixtures can be used. The screening with 
antibodies can be done either by classical ELISA tech- 
niques (Engvall and Pesce, 1978) or be carried out on 
nitrocellulose blots of proteins previously separated by 

15 polyacrylamide gel electrophoresis (Western blotting, 
Towbin et al., 1979). The purified peptide can be further 
analysed and its identity confirmed by amino acid 
composition and sequence analysis. 

Bioassays for biological activity will of course 

2o depend upon the nature and function of the final peptide 
of interest. 

It has to be understood that the present invention 
is also applicable for the production of labeled proteins 
which may be biologically active using the plant seed 
25 storage proteins as suitable vectors. In this case, plant 
regeneration of the obtained transf ormants , as described 
under point 3 hereabove, has to occur under conditions by 

which labeled carbon sources ( 13 C) and/or nitrogen sources 

15 2 
( N) and/or hydrogen sources ( H) and/or sulphur sources 

35 32 
30 ( s ) and/or phosphor sources ( P) has to be provided to 

the transformed growing plants (Kollman et al . , 1979 ; 

Jung and Jettner, 1972 ; De Wit et al . , 1978). 

Further characteristics of the invention will 

appear in the course of the non-limiting disclosure of 

35 specific examples, particularly on the basis of the 

drawings in which: 
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- Figs. 1,1 A # 2A, 2 and 3 refer to overall 
features of 2S - storage proteins as already discussed 
above . 

- Fig. 4 represents part of the sequence of the 
5 Brazil nut 2S-albumin obtained from the pBN2S1 plasmid 

obtained as indicated hereafter and related elements. 

- Fig. 5 represents restriction sites used in the 
constructions shown in other drawings . 

Figs. 6 and 7 show diagrammatically the succes- 
10 sive Phases of the construction of a chimeric plasmid in- 
cluding a restriction fragment containing the nucleic acid 
encoding a precursor, (the herein so-called "precursor- 
coding nucleic acid" the whole suitable for modification 
by an insertion of DNA sequences encoding a polypeptide of 
15 interest, particularly through site-directed mutagenesis. 

- Fig. 8 shows the restriction sites and genetic 
map of a plasmid suitable for the performance of the above 
site-directed mutagenesis. 

Fig, 9 shows diagrammatically the different 
20 ste P s of the site-directed mutagenesis procedure of 
Stanssens et al (1987) as generally applicable to the 
modification of nucleic acid at appropriate places. 

- Figs. 10, 11 and 12 illustrate diagrammatically 
the further steps of the modification of the abovesaid 

25 chimeric plasmid including said precursor nucleic acid to 
include therein, in a non essential region of its precur- 
sor nucleic acid sequence, an insert encoding a polypepti- 
de of interest, Leu-enkephalin by way of example in the 
following disclosure. 

30 " Fi 9- 13 represents the sequence of 1kb fragment 

containing the Arabidopsis thaliana 2S albumin gene and 
shows related elements. 

- Fig. 14 provides the protein sequence of the 
large sub-unit of the above Arabidopsis 2S protein 

35 together with related oligonucleotide sequences. 

- Fig. 15 represents the restriction map of 
PGSC1703. 
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- Fig. 16 represents the restriction map of 
PGSC1703A. 

- Fig. 17A represents a chromatogram of an aliquot 
of the synthetic peptide YGGFLK, used as marker, on a C4 
column. The gradient (dashed line) is isocratic at 0% 
solvent between 0 and 5 minutes , and solvent B increases 
to 100% into 70 minutes. Solvent A: 0,1% TFA in water; 
solvent B: 0,1% TFA in 70% CH 3 CN. 

- Fig. 17B represents a chromatogram of a tryptic 
digestion on oxidized 2S under the same conditions as done 
in Fig.17A. The hatched peak was collected and subjected 
to further purification. 

Fig.18A represents a chromatogram of an aliquot 
of the synthetic peptide YGGFLK, used as a marker, on a 
C18 column. The gradient (dashed line) is isocratic at 0% 
solvent B between 0 and 5 minutes, and solvent B increases 
to 100% into 70 minutes. Solvent A: 0,1% TFA in water; 
solvent B: 0,1% TFA in 70% CH 3 CN. 

Fig. 18B represents the rechromatography on the 
C18 column of the YGGFLK containing peak obtained from 
HPLC on the C4 column (see Fig. 17B) . The running 
conditions are the same as for Fig. 18A. 

Fig. 19 represents the results of the aminoacid 
sequence determination on YGGFLK. The left corner box 
shows standard of PTH-amino acids (20 pmol each). The 
signal for cycles 1 to 6 is 8 times more attenuated as the 
reference . 

Fig. 20A represents a chromatogram showing the 
YGGFL peptide used as marker. This peptide is the result 
of a craboxypeptidase B digestion on the synthetic peptide 
YGGFLK. The running conditions are the same as in Fig.17A. 
_ Fig. 20B shows the isolation of the IGGFL peptide, 
indicated with* , after carboxypeptidase B digestion on 
the YGGFLK peptide, that has been isolated from the plant 
material . 
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Fig. 21 shows diagrammatically the successive 
phases of the construction of a chimeric 2S albumins 
Arabidopsis thaliania gene including the deletion of 
practically all parts of the hypervariable region and its 

. replacement by a AccI site, the insertion of the sequences 
encoding the GHRF and cleavage sites, given by way of 
example in the following disclosure, in the AccI site, 
particularly through site-directed mutagenesis and the 
cloning of said chimeric gene in plant vector suitable for 

D plant transformation. 

- Fig. 22A shows the eight oligonucleotides used 
in the constructions of the GHRFS and GHRFL genes. The 
limits of the oligonucleotides are indicated by vertical 
lines, and the numbers above and below said oligonucleo- 

5 tides indicate their number. In oligonucleotides 4 and 8 
the bases enclosed in the box are excluded, resulting in 
the gene encoding GHRFS. The peptide sequence of said 
GHRFS and GHRFL and the methionine sequences providing the 
CnBr cleavage sites are shown above the DNA sequence. 

jO ~ Pig- 22B shows the AccI site of the modified 

AT2S1 gene and the insertion of said GHRF ' s in said AccI 
site in such a way that the open reading frame is main- 
tained . 

Example £: 

25 As a first example of the method described, a 

procedure is given for the production of Leu-enkephalin, a 
pentapeptide with opiate activity in the human brain and 
other neural tissues (Hughes et al . , 1975a). A synthetic 
oligomer encoding the peptide and specific protease 

30 cleava ?e sites is substituted for part of the 
hypervariable region in a cDNA clone encoding the 2S 
albumin of Bertholletia excelsa (Brazil nut) . This 
chimeric gene is fused to a fragment containing the 
promoter and signal peptide encoding regions of the 

35 soybean lectin gene. Lectin is a 7S albumin seed storage 
protein (Goldberg et al. f 1983). The entire construct is 
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transferred to tobacco plants using an Agrobacterium 
mediated transformation system. Plants are regenerated, 
and after flowering the seeds are collected and the 2S 
albumins purified. The enkephalin peptide is cleaved from 
the 2S albumin using the two specific proteases whose 
cleavage sites are built into the oligonucleotide, and 
then recovered using HPLC techniques . 

1. cDNA synthesis and screening. 

Total RNA is isolated from nearly mature seeds of 
the Brazil nut using the method described by Harris and 
Dure (1981). Poly A+ RNA is then isolated using oligo dT 
chromatography (Maniatis et al . , 1982). cDNA synthesis and 
cloning can be done using any of several published methods 
(Maniatis et al . , 1982; Okayama and Berg, 1982; Land et 
al., 1981; Gubler and Hoffman, 1983). In the present case, 
the 2S albumin from Brazil nut was sequenced (Ampe et al., 
1986), and an oligonucleotide based on the amino acid 
sequence was constructed. This was used to screen a cDNA 
library made using the method of Maniatis et al . (1982). 
The resulting clone proved to be too short, and a second 
library was made using the method of Gubler and Hoffman 
(1983) and screened using the first, shorter cDNA clone. A 
DNA recombinant containing the Brazil nut 2S-albumin 
sequence was isolated. The latter was further cloned in 
plasmid pUC 18. Yanisch-Perron, C. , Vieira, J. and 
Massino, J. (1985) Gene 21, PP. 103-119. 

The recovered plasmid was designated pBN 2S1 . The 
derived protein sequence, the DNA sequence, the region to 
be substituted, and the relevant restriction sites are 
shown in fig. 4. 

The deduced protein sequence (obtained from 
plasmid pBN2S1 ) is shown above the DNA sequence, and the 
proteolytic processing sites are indicated (in fig. 4). 
The end of the signal sequence is indicated by a 
Restriction sites used in the construction in figure 6, 7, 
10, 11 and 12 are indicated. The polylinker of the cloning 
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vector is shown in order to indicate the PstI site used in 
the latter part of the construction. The protein and DNA 
sequences of the peptide to be inserted are shown below 
the cDNA sequence, as well as the rest of the oligonucleo- 
tide to be used in the mutagenesis. During the mutagenesis 
procedure the oligonucleotide shown is hybridized to the 
opposite strand of the cDNA (see figure 10). 
2. Construction of a chimeric gene. 

The 2S albumin gene is first fused to the DNA 
fragment encoding the promotor and signal peptide of the 
soybean lectin gene. The cleavage point of the signal 
peptide in both lectin and Brazil nut is derived from 
standard consensus sequences (Perlman and Halvorson, 
1983). The relevant sequences are shown hereafter as well 
as in figure 4. 
pLe 1 

A N * S A 

GCA AAC TCA GCG 

CGT TTG AGT CGC 
Ddel 
C/TNAG 



pSOYLEA 1 

A N * S D L 
GCA AAC TCA GAT CTG 
25 CGT TTC AGT CTA GAC 
Bg1II 
A/GATCT 

PBN2S1 
T A * F R A T 
30 ACC GCC TTC CGG GCC ACC 
TCC CGG AAG GCC CGG TGG 
Bg1II 

GCCNNNN/NGGC 

The protein and double stranded DNA sequences in 
35 the regions of the signal peptide/mature protein sequences 
in the plasmids pLel, pSOYLEA 1 and pBN2S1 are shown in 
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figure 5. The positions and recognition sites of the 
restriction sites used in the constructions shown in the 
drawings are indicated. indicates the protein cleavage 
site at the end of the signal sequence. 

^ The starting point for the construction is the 

plasmid pLel (Okamuro et al . , 1987), which contains a 
soybean genomic Hindlll fragment. This fragment includes 
the entire soybean lectin gene, its promotor, and 
sequences upstream of the promoter which may be important 

l0 for seed specific expression. From this fragment a 
suitable soybean lectin promotor /signal sequence cassette 
was constructed as shown in fig. 6a. A Ddel site is 
present at the end of the sequence encoding the signal 
sequence (SS), and its cleavage site (C/TCAG) corresponds 

15 to the processing site. To obtain a useful restriction 
site at this processing site, a KpnI-Ddel fragment of the 
SS sequence (hereafter designated as "ss") is isolated 
from pLE1 and cloned into pLK57 (Botterman, 1986) itself 
linearized with Kpnl and Bglll. The Ddel and Bglll ends 

2Q are filled in with Klenow DNA Polymerase I. this 
reconstructs the Bglll site (A/GATCT) , whose cleavage site 
now corresponds to the signal sequence processing site 
(see fig. 6, 7a). The plasmid so-obtained, pSOYLEAl , thus 
consists of plasmid pLK57 in which the KpnI-Ddel fragment 

25 of the SS sequence (ss) initially contained in pLE1 is 
substituted for the initial KpnI-Bglll fragment of pLK57. 
A Hindlll site is placed in front of this fragment by 
substituting a KpnI-PstI fragment containing said Hindlll 
site from pLK69 (Botterman, 1986) for the Pstl-Kpnl 

30 fragment designated by (1) in pSoyLeal as shown dia- 
grammatically in fig. 4. this intermediate construction is 
called pSoyLea2. In a second step the lectin promoter is 
reconstructed by inserting the Hindlll-Kpnl fragment (2) 
of pLE1 in pSoyLea2 . As there is another Bglll site 

35 present upstream of the promoter fragment, the lectin 
promoter/signal sequence cassette is now present as a 
Bglll-Bglll fragment in the plasmid pSoyLea3 . 
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This cassette is now fused , in register, with a 
205 bp Brazil nut cDNA fragment of plasmid pBN2S1 and 
containing the coding sequences for the Brazil nut pro-2S 
albumin (i.e., the entire precursor molecule with the 
exception of the signal sequence).. This is done as shown 
in figure 5. The 205bp fragment obtained after digestion 
of the cDNA clone pBN2S1 (fig. 4) with Bgll, treatment 
with Klenow DNA Polymerase I to resect the Bgll protruding 
ends, and digestion with PstI is cloned into pUC18 
(Yannish-Perron et al . # 1985) which has been linearized by 
digestion with Smal and PstI. The resulting plasmid, 
pUC1 8-BN1 , is digested with both EcoRI and Aval, both ends 
filled in, and religated. This results in the reconstruc- 
tion of a new plasmid, designated pUC18-BN2, containing 
15 the desired Brazil nut coding sequence with an EcoRI site 
at the beginning ( fig . 7 ) . 

To fuse the Brazil nut coding sequences in 
register to the lectin promoter /signal sequence cassette, 
PUC18-BN2 is digested with EcoRI and the ends partially 
20 filled in using Klenow enzyme in the presence of dATP 
alone. The remaining overhanging nucleotides are removed 
with S1 nuclease, after which a PstI digest is carried 
out. This yields a fragment with one blunt end and one 
PstI digested end. The lectin promoter/signal sequence 
25 fragment is taken from pSoyLeal (fig. 7) as an EcoRI-Bglll 
fragment with filled in Bglll ends. The two fragments are 
ligated together with Pstl-EcoRI digested pUC18. This 
results in pUC18SLBN1, with a reconstructed Bglll site at 
the junction of the signal peptide encoding sequence and 
3Q the Brazil nut sequences (fig. 7). pUC18SLBN1 thus 
consists of the pUC18 plasmid in which there have been 
inserted the Bgill-EcoRI fragment (shown by (3) on fig. 6) 
of pSoyLeal and, upstream thereof in the direction of 
transcription the EcoRI-Pst-EcoRI fragment supplied by 
35 PUC18BN2 and containing the 205 bp cDNA coding sequence 
for the Brazil nut pro-2S albumin. 
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However, the reading frame is not properly 
maintained. In order to correct this, the plasmid is 
linearized with Bglll, treated with S1 nuclease, and 
religated. This intermediate is designated pUC18SLBN2. The 
^ construction is finally completed in two steps by inser- 
ting the Kpnl fragment carrying the 5 1 part of the 
promoter from pSoyLea3, yielding pUC18SLBN3, and inserting 
into the latter the PstI fragment containing the 3 1 part 
of the Brazil nut cDNA from pBN2S1 . The resulting final 

10 construction, pUCSLBN4, contains the lectin promoter/ sig- 
nal sequence - Brazil nut cDNA sequence fusion contained 
within a BamHI fragment. 

3 . Substitution of part of the hypervariable 
region with sequences encoding enkephalin and protease 

^5 cleavage sites. 

The Leu-enkephalin peptide has the sequence Tyr- 
Gly-Gly-Phe-Leu (Hughes et al., 1975b). In order to be 
able to recover the intact polypeptide from the hybrid 2S 
albumin after purification, codons encoding Lysine are 

2 q placed on either side of the enkephalin coding sequences. 
This allows the subsequent cleavage of the enkephalin 
polypeptide from the 2S albumin with the endopeptidases 
endolysin-C and carboxypeptidase B in the downstream 
processing steps. Finally, in order for the 

25 oligonucleotide to be capable of hybridizing to the gapped 
duplex molecule during mutagenesis (see below), extra 
sequences complementary to the Brazil nut sequences to be 
retained are included. The exact sequence of the 
oligonucleotide, determined after the study of codon usage 

30 in several plant storage protein genes, is 

5 1 -GCAACAGGAGAAGTACGGTGGATTCTTGAAGCAGATGCG-3 ' . 
The substitution of part of the sequence encoding 
the hypervariable region of the Brazil nut 2S albumin is 
done using site-directed mutagenesis with the oligonucleo- 

35 tide as primer (figs. 4 and 10). The system of Stanssens 
et al. (1987) is used. 
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The Stanssens et al method is illustrated in fig. 
9 and recalled hereinafter. It makes use of plasmid 
pMac5-8 whose restriction and genetic map is shown in fig. 
8 and whose main features are also recalled hereinafter. 
5 The positions of the relevant genetic loci of 

pMac5-8 are indicated in fig. 8. The arrows denote their 
functional orientation. fdT: central transcription termi- 
nator of phage fd; F1-ORI: origin of replication of fila- 
mentous phage f 1 ; ORI: ColE1-type origin of replication; 

R R 
1Q BLA/Ap : region coding for 0-lactamase; CAT/Cm : region 

coding for chloramphenicol acetyl transferase. The 
positions of the amber mutations present in pMc5-8 (the 
bla-am gene does not contain the Seal site) and pMc5-8 
( cat-am : the mutation eliminates the unique PvuII site) 
are indicated. Suppression of the cat amber mutation in 
both supE and supF hosts results in resistance to at least 
25 pg/ml Cm. pMc5-8 confers resistance to ±20 pg/ml and 
100 pg/ml Ap upon amber-suppression in supE and supF 
strains respectively. The EcoRI, Ball and Ncol sites 
20 present in the wild-type cat gene (indicated with an aste- 
risk) have been removed using mutagenesis techniques. 

The principle of the Stanssens method as also 
applied to the substitution of the Leu-enkephalin peptide 
for the selected hypervariable region of 2S-albumin region 
25 here examplified, as described hereafter, is also first 
recalled hereafter: 

Essentially the mutagenesis round used for the 
above mentioned substitution is ran as follows. Reference 
is made to fig. 9 f in which the amber mutations in the Ap 
30 and Cm selectable markers are shown by closed circles . The 
symbol represents the mutagenic oligonucleotide. The 

mutation itself is indicated by an arrowhead. 

The individual steps of the process are as 

follows: 

35 " Cloning of the. target DNA fragment into pMa5-8 
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(I). This vector carries on amber mutation in the 
Cm R gene and specifies resistance to ampicillin. 
Preparation of single stranded DNA of this recom- 
binant (II) from pseudoviral particles. 
Preparation of a restriction fragment from the 
complementary pMc-type plasmid (III). pMc-type 
vectors contain the wild-type Cm gene while an 
amber mutation is incorporated in the Ap 
resistance marker. 

Construction of gap duplex DNA (hereinafter called 
gdDNA) gdDNA (IV) by In vitro DNA/DNA hy- 
bridization. In the gdDNA the target sequences are 
exposed as single stranded DNA. Preparative puri- 
fication of the gdDNA from the other components of 
the hybridization mixture is not necessary. 
Annealing of the synthetic oligonucleotide to the 
gdDNA (V) . 

Filling in the remaining gaps and sealing of the 
nicks by a simultaneous in vitro DNA polymera- 
se/DNA ligase reaction (VI). 

Transformation of a mutS host, i.e., a strain 
deficient in mismatch repair, selecting for Cm 
resistance. This results in production of a mixed 
plasmid progeny (VII). 

Elimination of progeny deriving from the template 
strand (pMa-type) by retransf ormation of a host 
unable to suppress amber mutations (VIII). Selec- 
tion for Cm resistance results in enrichment of 
the progeny derived from the gapped strand, i.e., 
the strand into which the mutagenic oligonucleoti- 
de has been incorporated. 

Screening of the clones resulting from the re- 
transformation for the presence of the desired 
mutation . 

In the mutagenesis experiment, depicted in figure 
Cm resistance is used as an. indirect selection for the 
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synthetic marker. Obviously, an experiment can be set up 
such that the Ap selectable marker is exploited. In the 
latter case the single stranded template (II) and the 
fragment (III) are the pMc- and pMa-type, respectively. A 
5 single mutagenesis step not only -results in introduction 
of the desired mutation but also in conversion of the 
plasmid from pMa-type to pMc-type or vice versa. Thus, 
cycling between these two configurations (involving 
alternate selection for resistance to ampicillin or 

10 chloramphenicol) can be used to construct multiple 
mutations in a target sequence in the course of 
consecutive mutagenesis rounds. 

Reverting now to the present example relative to 
the substitution of part of the sequence encoding the 

-15 hypervariable region of the Brazil nut 2S albumin, the 
Stanssens et al system is thus applied as follows: 

The Pstl-EcoRI fragment of the chimeric gene 
containing the region of interest (see figs. 10 f 11 and 
also fig. 4) is inserted in a pMa vector which carries an 

20 intact beta-lactamase gene and a chloramphenicol 
acetyltransf erase gene with an amber mutation fig. 10, so 
that the . starting plasmid confers only ampicillin 
resistance but not chloramphenicol resistance. Single 
stranded DNA (representing the opposite strand to that 

25 shown in figure 4) is prepared and annealed with the 
EcoRI-PstI linearized form of a pMc type plasmid, yielding 
a gapped duplex molecule. The oligonucleotide is annealed 
to this gapped duplex. The single stranded gaps are filled 
with Klenow DNA polymerase I, ligated, and the mixture 

30 transformed into the appropriate host. Clones carrying the 
desired mutation will be ampicillin sensitive but 
chloramphenicol resistant. Transf ormants resistant to 
chloramphenicol are selected and analyzed by DNA 
sequencing. Finally, the hybrid gene fragment is inserted 

35 back into the lectin/Brazil nut chimera by replacement cf 
the Pstl-Ncol fragment in pUC18SLBN4 with the mutagenised 
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one from pMC58BN (fig. 11). The resulting plasmid, 
pUC18SLBN5, contains the lectin promoter and signal 
sequence fused to a hybrid Brazil nut-enkephalin gene, all 
as a BamHI fragment. 

4. Transformation of tobacco plants. 
The BamHI fragment containing the chimeric gene is 

inserted into the BamHI site of the binary vector pGSC1702 
(fig. 12). This vector contains functions for selection 
and stability in both E. coli and A. tumef aciens f as well 
as a T-DNA fragment for the transfer of foreign DNA into 
plant genomes (Deblaere et al . , 1987). The latter consists 
of the terminal repeat sequences of the octopine T-region. 
The BamHI site into which the fragment is cloned is 
situated in front of the polyadenylation signal of the 
T-DNA gene 7. A chimeric gene consisting of the nopaline 
synthase (nos) promoter, the neomycin phosphotransferase 
protein coding region (neo) and the 3' end of the OCS gene 
is present, so that transformed plants are rendered 
kanamycin resistant. Using standard procedures (Deblaere 
et al. # 1987), the plasmid is transferred to the 
Agrobacterium strain C58C1Rif carrying the plasmid 
PGV2260. The latter provides in trans the vir gene 
functions required for successful transfer of the T-DNA 
region to the plant genome. This Agrobacterium is then 
2 5 used to transform tobacco plants of the strain SR1 using 
standard procedures (Deblaere et al . , 1987). Calli are 
selected on 100 pg/ml kanamycin, and resistant calli used 
to regenerate plants. DNA prepared from these plants is 
checked for the presence of the hybrid gene by 
3Q hybridization with the Brazil nut 2S albumin cDNA clone or 
the oligonucleotide. Positive plants are grown and 
processed as described below. 

5. Purification of 2S albumins from seeds. 
Positive plants are grown to seed, which takes 

2 5 about 15 weeks. Seeds of individual plants are harvested 
and homogenized in dry ice, and extracted with hexane. The 
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remaining residue is taken up in Laenusli sample buffer , 
boiled, and put on an SDS polyacrylamide gel (Laemli, 
1970). Separated proteins are electroblotted onto 
nitrocellulose sheets (Towbin et al. f 1979) and assayed 
5 with a commercially available polyclonal antibody of the 
Leu-enkephalin antigen (UCB cat. £ i72/001 f ib72/002). 

Using the immunological assays above, strongly 
positive plants are selected. They are then grown in 
larger quantities and seeds harvested. A hexane powder is 

10 prepared and extracted with high salt buffer (0.5M NaCI, 
0.05 M Na-phosphate pH 7.2). This extract is then dialysed 
against water, clarified by centrif ugation (50,000xg for 
30 min) , and the supernatant further purified by gel fil- 
tration over a Sephadex G-75 column run in the same high 

15 salt buffer. The proteins are further purified from non- 
ionic, non protein material ion exchange chromatogrpahy on 
a DEAE-Cellulose column. Fractions containing the 2S pro- 
tein mixture are then combined, diaiysed against 0.5 % 
NH 4 HC0 3 , and lyophilised. 

20 6 - Recovery of Leu-enkephalin. 

The mixture of purified endogenous 25 storage 
proteins and hybrid 2S proteins are digested with 
endo-Lys-C. In order to ensure efficient proteolytic 
degradation, the 2S proteins are first oxidized with 

25 per formic acid (Hirs, 1956). The oxidation step opens the 
disulfide bridges and denatures the protein. Since 
Leu-enkephalin does not contain amino acid residues which 
may react with per formic acid, the opiate will not be 
changed by this treatment. Endo-Lys-C digested is carried 

30 out in an 0.5 % NH 4 HC0 3 solution for 12 hours at 37 *C and 
terminated by lyophilization . This digestion liberates the 
Leu-enkephalin, but still attached to the C terminal 
Lysine residue. Since the hybrid protein contains very few 
other lysine residues, the number of endo-Lys-C peptides 

35 is very small, simplifying further purification of the 
peptide. The enkephalin-Lys peptides are purified by HPLC 
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reversed phase chromatography using a C18 column (e.g., 
that commercialized under the trademard VYDAC) . The 
gradient consists of 0.1 % trif luoroacetic acid as initial 
solvent (A) and 70 % acetonitrile in 0.1 % triflouroacetic 
5 acid as diluter solvent (B) . A gradient of 1.5 % solvent B 
in A per minute is used under the conditions disclosed by 
Ampe et al . , (1987). The purified enkephalin-Lys peptide 
is identified by amino acid analysis and/or by 
immunological techniques . It is further treated by 

10 carboxypeptidase B as disclosed by Ambler, (1972) in order 
to remove the carboxyl terminal Lysine residue. Finally, 
the separation and purification of the opiod peptide is 
finally achieved by reversed phase HPLC chromatography 
according to the method disclosed by Lewis et al . , (1979). 

15 Other methods aravailable, as illustrated in 

Example II. 

7. Assay of Leu-enkephalin biological activity. 

Enkephalins inhibit [ 3 H] -naloxone binding in 
sodium-free homogenates of guinea pig brain. Opiod acivity 
20 can be assayed as the ability to inhibit specific 
[ 3 H] -naloxone binding to rat brain membranes (Pasternak et 
al., 1975) as previously described (Simantov et al . , 
1976). One unit of opiod activity "enkephalin" was defined 
as that amount that yields 50 % occupancy in a 200 pi 
25 assay (Colquhaun et al . , 1973). 

Example II: 

As a demonstration of the flexibility of the 
technique, a procedure for the production of 
Leu-enkephalin using a different 2S albumin is given. In 

30 this case, instead of using a cDNA clone from Bertholletia 
excelsa as basis for the construction, a genomic clone 
isolated from Arabisoosis thaliana is used. Since a 
genomic clone is used the gene's own promoter is used, 
simplifying the construction considerably. To further 

35 demonstrate the generality of the technique, the altered 
2S albumin gene is brought to expression in three 
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different plants: tobacco, Arabidopsis and Brassica napis , 
a relative of Arabidopsis which also has a 2S albumin (see 
introduction) . Many of the details of this example are 
similar to the previous one and are thus described more 
5 briefly. 

1 . Cloning of the Arabidopsis thaliana 2S albumin 

gene • 

Given the ease of purification of 2S albumin (see 
introduction, example 1), the most straightforward way to 

10 clone the Arabidopsis 2S albumin gene is to construct 
oligonucleotide probes based on the protein sequence. The 
protein sequence was determined by standard techniques, 
essentially in the same way as that of the Brazil nut 2S 
albumin (Ampe et al., 1986). Figure 13 shows the sequence 

^ of the 1 kb Hindi 1 1 fragment containing the Arabidopsis 
thaliana 2S albumin gene. The deduced protein sequence is 
shown above the DNA sequence, and proteolytic processing 
sites are indicated. The end of the signal sequence is 
indicated by a , and SSU indicates small subunit. The 

2 Q protein and DNA sequences of the peptide to be inserted 
are shown below the cDNA sequence, as well as the rest of 
the oligonucleotide to be used in the mutagenesis. During 
the mutagenesis procedure the oligonucleotide shown is 
hybridized to the opposite strand of the DNA sequence 

25 shown. The Nde I site used to check the orientation of the 
Hindlll fragment during the construction is underlined 
(bp- 117). The numbering system is such that the A of 
initiation codon is taken as base pair 1 . 

The difficulty in using oligonucleotide probes is 

3Q that more than one codon can encode an amino acid, so that 
unambiguous determination of the DNA sequence is not 
possible from the protein sequence. Hence the base inosine 
was used at ambiguous positions. The structure of inosine 
is such that while it does not increase the strength of a 

35 hybridization, it does not decrease it either (Ohtsuka et 
al., 1985; Takahashi et al . , 1985). On this basis, three 
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oligonucleotide probes were designed as shown in figure 
14. The protein sequence of the large sub-unit of the 2S 
albumin of Arabidopsis thaliana » Under the protein 
sequence are the sequences of the oligonucleotides used as 
5 hybridization probes to clone the gene. I designates 
Inosine . 

The three oligonucleotides were used to screen a 
genomic library of Arabidopsis DNA constructed in the 
phage Charon 35 (Loenen and Blattner, 1983) using standard 

10 methods (Maniatis et al . , 1982; Benton and Davis, 1977). 
The oligonucleotides were kinased (Miller and Barnes, 
1986), and hybridizations were done in 5X SSPE (Maniatis 
et al., 1982), 0.1 % SDS, 0.02 % Ficoll, 0.02 % 
Polyvinylpyrolidine, and 50 pg/ml sonicated herring sperm 

15 DNA at 45 # C. Filters were washed in 5X SSPE, 0.1 % SDS at 
45 degrees for 4-8 minutes. Using these conditions, a 
clone was isolated which hybridized with all three 
oligonucleotide probes . Appropriate regions were subcloned 
into pUC18 (Yanisch-Perron et al., 1985) using standard 

20 techniques (Maniatis et al . , 1982) and sequenced using the 
methodology of Maxam and Gilbert (1980). The sequence of 
the region containing the gene is shown in figure 13. 

2. Substitution of part of the hypervariable 
region with sequences encoding enkephalin and protease 

25 cleavage sites . 

The gene isolated above was used directly for 
construction of a Leu-enkephalin/2S albumin chimera. As in 
the first example, an oligo was designed incorporating the 
Leu-enkephalin sequence and lysine encoding codons on 

30 either side of- it, in order to be able to recover the 
enkephalin polypeptide in the downstream processing steps, 
and extra sequences complementary to the flanking 
Arabidopsis sequences in order for the oligonucleotide to 
be able to hybridize to the gapped duplex molecule during 

35 the mutagenesis. The resultant oligonucleotide has the 
sequence : 
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5 1 -CAAGCTGCCAAGTACGGTGGATTCTTGAAGCAGCACCAAC- 3 1 

its position in the sequence is shown in figure 8. 

The region containing the gene and sufficient 
flanking regions to include all necessary regulatory 

5 

signals is contained on a 3.6 kb Bglll fragment, 
inserted in the cloning vector pJB65 (Botterman et al., 
1987). The clone is called pAT2SlBg. The region to be 
mutagenized is contained on 1 kb Hind III fragment 
within the 3.6 kb Bglll fragment , and this smaller 
fragment is inserted into the Hindlll site of the pMa5-8 
vector of Stanssens et al., (1987) (fig. 5c). The 
orientation is checked using an asymmetric Ndel site 
(figure 8) . The mutagenesis is carried out using exactly 
the strategy described in step 3 of example 1. 

15 

Subsequently the hybrid gene is reinserted into the 

larger fragment with the mutagenized one using standard 

techniques (Maniatis et al., 1982). The orientation is 

again checked using the Ndel site. 

" 3. Transformation of plants. 

20 

The Bglll fragment containing the hybrid gene and 

sufficient flanking sequences both 5 1 and 3 v to the 

coding region to insure that appropriate signals for 

gene regulation are present is inserted into the BamHI 

25 site of the same binary vector, pGSC1702, used in 

example 1 (figure 12) . This vector is described in 

section 4 of example 1. Transformation of tobacco plants 

is done exactly as described there. The techniques for 

transformation of Arabidopsis thaliana and Brassica 

_ napus are such that exactly the same construction, in 
30 — — 

the same vector, can be used. After mobilization to 
Agrobacterium tumefaciens as described in section 4 of 
example 1, the procedures of Llyod et al., (1986) and 
Klimaszewska et al. (1985) are used for transformation 

35 



WO 89/03887 



47 



PCT/EP88/00944 



10 



15 



20 



25 



30 



of Arabidopsis and Brassica respectively. In each case, 
as for tobacco, calli can be selected on 100 fig/ml 
kanamycin, and resistant calli used to regenerate 
plants. DNA prepared from such plants is checked for the 
presence of the hybrid gene by hybridization with the 
oligonucleotide used in the mutagenesis (In the case of 
tobacco and Brassica , larger portions of the hybrid 
construct could be used, but in the case of the 
Arabidopsis these would hybridize with the endogenous 
gene. ) . 

In the embodiment of the invention, Bglll fragment 
containing the hybrid gene and sufficient flanking 
sequences both 5« and 3' to the coding region to insure 
that appropriate signals for gene regulation are present, 
is inserted into the Bglll site of the binary vectors 
pGSC1703 (Fig. 15) or pGSC1703A (Fig. 16) . pGSC1703 
contains functions for selection in both E. coli and 
Agrobacterium ,as well as the T-DNA fragments allowing 
the transfer of foreign DNA into plant genomes (Deblaere 
et al., 1987) It further contains the bidirectional 
promotor TR (Velten et al., 1984) with the neomycine 
phosphotransferase protein coding region (NPTII) and the 
3« end of the ocs gene. It do not contain a gene 
encoding ampicillin resistance, as pGSC1702 does, so 
that carbenicillin as well as claforan can be used to 
kill the Agrobacteria after the infection step. Vector 
PGSC1703A contains the same functions as vector 
pGSC1703, with an additional gene encoding hygromycine 
transferase. This allows the selection of the 
trans formants on both kanamycin as hygromycine. 
Transformation of tobacco plants is done exactly as 
described in section 4 of Example I, whereby the hybrid 
gene is inserted into the plant transformation vector 
PGSC1703. Transformation of Arabidopsis thaliana and 
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Brass ica napus were done with pGSC1703A in which the 
hybrid AT2S1 gene has been inserted. After mobilization 
to Agrobacterium tumefaciens C58ClRif carrying the 
plasmid pMP90 (Koncz and Schell, 1986) , which latter 
provides in trans and vir gene functions but which do 
not carry a gene encoding ampicillin resistance, the 
procedures of Lloyd et al., (1986) and Klimaszewska et 
al. (1985) are used for trans formation of Arabidopsis 
and Brassica respectively. Carbenicillin is used to kill 
the Aarobacterium after co-cultivation occur ed. In each 
case, as for tobacco, calli can be selected on 
100 nq/ml kanamycin, and resistant calli used to 
regenerate plants . DNA prepared from such plants is 
checked for the presence of the hybrid gene by 
hybridization with the oligonucleotide used in the 
mutagenesis. (In the case of tobacco, larger portions of 
the hybrid construct could be used, but in the case of 
Brassica and Arabidopsis these would hybridize with the 
endogenous gene.) 

4. Purification of 2S albumins from seeds and 
further processing 

Positive plants from each species are grown to 
seed. In the case of tobacco this takes about 15 weeks, 
while for Arabidopsis and Brassica approximately 6 weeks 
and 3 months respectively are required. Use of different 
varieties may alter these periods. Purification of 2S 
albumins from seeds, recovery of the Leu-enkephalin, and 
assaying the latter for biological activity are done as 
follows . 

Methods used for the isolation of Enkephalin from 
Arabidopsis seeds 

Two methods were used to isolate Enkephalin from 
Arabidopsis seeds. First, a small amount of seeds 
isolated from several individual trans f ormants was 
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screened for the presence of chimeric 2S albumins. This 
is done because, as described by Jones et al., (1985), 
expression of introduced genes may vary widely between 
individual trans formants. Seeds from individual plants 
seen by this preliminary screening were then used to 
isolate larger amounts and determine yields more 
accurately. Both procedures are described below. 
A) Fast screening procedure for Enkephal in-conta ining 2S 
proteins 

Seeds of individual plants (approximately 50 mg) 
were collected and ground in an Eppendorf tube with a 
small plastic grinder shaped to fit the tube. No dry 
ice is used in this procedure. The resulting paste was 
extracted three times with 1 ml of heptane and the 
remaining residue dried. The powder was suspended in 
0.2 ml of 1M NaCl and centrifuged for 5 min in an 
Eppendorf centrifuge. This extraction was repeated 
three times and the supematants combined, giving a 
total volume of approximately a. 5ml. This solution was 
diluted 20 fold with water, giving a final NaCl 
concentration of 0.05M. This was stored overnight at 
4*C and then spun at 5000 rpm in a Sorvall SS-34 rotor 
for 40 min. The resulting supernatant was passed over a 
disposable C18 cartridge (SEP-PAC, Millipore, Milford, 
Massachusetts, U.S.A.). The cartridges were loaded by 
injecting the 10ml supernatant with a syringe through 
the columns at a rate of 5 ml/min. The cartridge was 
then washed with 2 ml of 0.1% TFA and proteins were 
desorbed by a step elution with 2 ml portions of a 0.1% 
TFA solution containing 7%, 14%, 21% etc. up to 70% 
acetonitrile. The fractions eluting in the range from 
28% to 49% acetonitrile are enriched for 2S albumins as 
judged by SDS-polyacryl amide gel analysis performed on 
aliquots taken from the different fractions. The 2S 
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albumin-containing fractions were combined and dried in 
a Speed Vac concentrator (Savant Instruments) . 

The combined fractions were reconstituted in 0.95 
ml 0.1% TFA in water, filtered through an HV-4 Millex 
filter (Millipore) , and applied to a reversed phase C 4 
column 25 cm in length and 0.46 cm in diameter (Vydac 
214TP54, pore size 300 angstrom, particle size 5 pm) . 
The HPLC equipment consisted of 2 pumps (model 510), a 
gradient . controller (model 680) and an LC 
spectrophotometer detector (Lambda-Max model 481, all 
from Waters, Milford, Massachusetts, U.S.A.). The 
gradients were run as follows: Solution A was 0.1% TFA 
in H 2 0, solution B 0.1% TFA in 70% CH 3 CN. For 5 
minutes, a solution of 0% B, 100% A was run over the 
column, after which the concentration of B was raised to 
100% in a linear fashion over 70 minutes. The column 
eluate was detected by absorbance at 214 nm. The 
fractions containing 2S albumins were collected and 
dried in a Speed Vac concentrator. 

In order to obtain a more complete digestion with 
proteases it is recommended that the proteins be 
denatured by oxidizing the disulfide bridges with 
performic acid. This is done by adding 0.5 ml of a 
solution made by mixing 9 ml of formic acid and 1 ml of 
30% H 2 0 2 at room temperature. The solution was made 2 
hours before use. The reaction is allowed to proceed 
for 30 min at 0*C and terminated by drying i n a Speed 
Vac concentrator. Traces of remaining performic acid 
were removed by twice adding 500 jil of water and 
lyophilizing the sample. 

The residue was redissolved in 0.75 ml of 0.1M 
Tris-HCl pH 8.5 after which 4 fig of TPCK-treated 
trypsin (Worthington) was added. The reaction was 
placed at 37 °C for 3 hours, after which it was 
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terminated by the addition of 10 nl of TFA and stored at 
-20 °C prior to analysis* The resulting peptide mixture 
is separated by HPLC using the columns and gradient 
mixtures described above. As a standard, a peptide of 
the same sequence as that expected (YGGFLK) was 
synthesized using standard techniques on a Biolynx 4175 
peptide synthesizer (LKB) . This peptide was run over 
the column and the retention time determined. The 
mixture of peptides resulting from the trypsin digest 
was then loaded on the same column and peptides with the 
same retention time as the standard were collected , 
dried, and reloaded on a C18-reversed phase column. 
The elution time of the marker peptide again served 
as a reference for the correct position of the 
enkephalin containing peptide. The identity of this 
peptide was confirmed by amino acid sequencing, 
which also allowed a rough quantitation • Pour plants 
of the six transformants analyzed were shown to 
contain significant quantities of Leu-enkephalin. 
By way of example the detailed analysis and 
processing steps are given below for one of these 
said four plants. 

B) Larger scale isolation and processing of Enke phalin 
from Arabidopsis seeds 
Grinding and initial extraction 

2.11 g of seeds from said plant were ground in a 
mortar in dry ice. Lipids were removed from the 
resulting powder by extracting three times with 5 ml of 
heptane. The resulting residue was dried. 
Protein extraction 

The powder was dissolved in approximately 4 ml of 
1.0M NaCl. The resulting paste was spun in an SS-34 
rotor at 17 , 500 rpm for 40 min. After each spin the 
supernatant was transferred to a fresh tube and the 
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pellet again resuspended in 4 ml of 1. OH NaCl. This 
procedure was repeated three times. The three 
supernatant s (12 ml total) were passed through a 0.45 /xm 
filter (HA, Millipore) . 

Isolation of 2S albumins via gel filtration 

The 12 ml of solution from the previous step was 
passed over a Sephadex G-50 medium (Pharmacia) column in 
two batches of 6ml. The column was 2.5 cm in diameter, 
100 cm in length, and run at a flow rate of 
approximately 27 ml/hr in 0.5M NaCl. Fractions of 
approximately 7 ml were collected. The fractions were 
monitored for the protein in two ways. First, total 
protein was detected by applying 10 /il of each fraction 
on a piece of Whatman 3 MM paper, indicating the fraction 
numbers with a pencil. The spots are dried for 1 min in 
warm air and the proteins fixed by a quick (30 sec) 
immersion of the paper sheet in a 10% TCA solution. The 
sheet is then transferred to a Commassie Blue solution 
similar to that used for polyacrylamide gel staining. 
After 1 min, the paper is removed and rinsed with tap 
water. Protein containing fractions show a blue spot on 
a white background. The minimum detection limit of the 
technique - is about 0.05 mg/ml. Those fractions 
containing protein were assayed for the presence of 2S 
albumins by adding 2 pi of the 7 ml fraction to 10 til of 
sample buffer and then loading 6 /xl of this mixture on a 
17.5% polyacrylamide minigel* Those fractions shown to 
contain 2S albumins were pooled; the total volume of the 
pooled fractions was 175 ml* 
Desalting of the isolated 2S albumins 

This was done via HPLC over a C 4 column 25 cm in 
length and 0.46 cm in diameter (Vydac 214TP54, pore size 
300 angstrom, particle size 5 pa) • The HPLC equipment 
consisted of 2 pumps (model 510) , a gradient controller 
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(model 680) and an LC spectrophotometer detector 
(Lambda-Max model 481, all from Waters, Milford, 
Massachusetts, U.S.A.). 21 ml of the 175 ml were loaded 
on this system in 6 runs of 3.5 ml each. The gradients 
were run as follows: Solution A was 0.1% TFA in H 2 0, 
solution B 0.1% TFA in 70% CH 3 CN. For 5 minutes, a 
solution of 0% B, 100% A was run over the column, after 
which the concentration of B was raised to 100% in a 
linear fashion over 70 minutes. During each run the 2S 
albumin fraction was collected, and after all 6 runs 
these fractions pooled and divided into 3 tubes, each of 
which therefore contained 7/175 of the 2S albumins from 
the 2.11 g seeds. Each of the aliguots was processed 
further separately and used for quantitative estimation 
of yields. 
Trypsin Digest 

Prior to digestion with trypsin the three aliguots 
were oxidized as described above. The trypsin digest 
was carried out essentially as described above. 0.95 ml 
of 0.1M Tris-HCl pH 8.5 was added to each aliquot, which 
was supplemented with 50 M9 of trypsin (Worthington) and 
the reaction allowed to proceed for 4 hr at 37 *C. 
Isolation of the YGGFLK peptide 

The enkephalin peptide containing the carboxyl 
terminal lysine residue was isolated using two 
25 sequential HPLC steps. As described in the small scale 
isolation procedure above, a peptide of the same 
sequence as that expected was synthesized and run over 
an HPLC system using the same column and gradient 
conditions described in the desalting step above. The 
retention time of the synthetic peptide was determined 
(Fig. 17A) . The three trypsin digests were then 
(separately) loaded on the same column and the material 
with the same retention time as that of the synthetic 

35 



20 



30 



WO 89/03887 



54 



PCT/EP88/00944 



10 



15 



20 



25 



30 



peptide collected (the hatched area in Fig. 17B) and 
dried. The same procedure was then followed using the 
same equipment and gradients except that a C18 column 
(25 x 0.46 cm, Vydac 218TP104 material of pore size 300 
angstrom and particle size 10 /*m) was used. Again 
material with the same retention time as the synthetic 
peptide was collected (Pig. 18 A and 18 B) . This resulted 
in three preparations each derived from 7/175 of the 
total 2S albumin. 

1/20 of the material in one of these three aliquots 
was used to check the sequence of the isolated peptide. 
This was determined by automated gas-phase sequencing 
using an Applied Biosystems Inc. (U.S.A.) 470A gas-phase 
sequenator. The stepwise liberated phenyl thiohydantoin 
(PTH) amino acid derivatives were analyzed by an on-line 
PTH-amino acid analyzer (Applied Biosystems Inc. 12 OA) . 
The sequenator and PTH-analyzer were operated according 
to the manufacturers instructions. The HPLC- 

chromatograms of the liberated PTH-amino acids from 
cycles 1 through 6 are shown in figure 19. The sequence 
was as expected YGGFLK. The yield of PTH-amino acid of 
the first cycle was used for calculate the yield of this 
intermediate peptide (251-277 nmol/gr seed). 
Removal of the extra Lysine from the enkephalin 

The three aliquots resulting from the previous step 
were resuspended in 100 pi of 0.2M N-Ethylmorpholine pH 
8.5 (Janssen Chimica, Belgium) and one third of each 
treated with 0.2 nq of carboxypeptidase B (Boehringer 
Mannheim, sequencing grade) at 37 # C. The three aliquots 
were treated for 5, 12, and 17 minutes respectively, but 
all three digests proved to be equally effective. After 
digestion the enkephalin was purified by HPLC using the 
same equipment, column, and gradients as described under 
desalting above. 
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The final yield of enkephalin was determined by 
doing an amino acid analysis. An aliquot representing 
1/150 of the total amount of the above mentioned three 
aliguots was hydrolyzed in 400 Ml of 6N HC1, 0.05% 
phenol at 110 *C for 24 h. The hydrolysate was dried and 
amino acids derivitised into phenylthiocarbamoyl (PTC) 
residues (Bildingmeyer et al., 1984). Three separate 
aliguots of the PTC residue mixture were quantified 
using the PICO— TAG amino acid analysis system (Waters, 
Millipore, Milford, Massachusetts, U.S.A.). Yields of 
enkephalin peptide were calc/ilated for each of the three 
samples using alpha amino-butyric acid as an internal 
standard. Based on an average of the three 
determinations a final yield of 206 nmol enkephalin/g 
seed was calculated. 

The identity of the peptide finally obtained was 
verified in three ways. First, its amino acid 
composition, which showed molar ratios of Gly, 1.76; 
Tyr, 1.00; Leu, 1.15 and Phe, 102. Secondly, its 
retention time on a reversed phase HPLC column matcb 
that of a reference enkenephalin peptide (fig. 20) 
and finally its amino acid sequence was determined. 
These criteria unambiguously identify the peptide 
isolated from chimeric 2S albumins as being Leu- 
enkephalin. 
Example III : 

As a third example of the method described, a 
procedure is given for the production of two growth 
hormone releasing factor (GHRF) analogs. Synthetic and 
natural analogs of the originally isolated 44 amino acid 
30 peptide (Guillemin et al., 1982) in which the methionine 
at position 27 has been replaced by a leucine and in 
which the carboxyl terminus is modified in various ways 
or even shortened by four amino acids have been shown to 



20 



25 



35 



10 



15 



WO 89/03887 5 6 PCT/EP88/00944 

be active (Kempe et al., 1986; Rivier et al., 1982). In 
this case two different analogs, designated hereafter as 
GHRFL and GHRFS, are produced. Both cases incorporate 
the substitution of leucine for methionine at position 
27. GHRFL is produced in such a way that the carboxyl 
terminus is Leu-NH 2 ,as is found in a natural form of the 
peptide (Guillemin et al., 1982). GHRFS ends in Arg- 
Hse-NH 2 , where Hse stands for homoserine. This analog 
was shown to be biologically active by Kempe et al. 
(1986). Both analogs are flanked by methionine codons 
in the 2S albumin so that they can be cleaved out by 
treatment with CnBr. This is possible as neither analog 
contains an internal methionine. After isolation of the 
two peptides using HPLC techniques they are chemically 
modified to result in the Leu-NH 2 and Arg-Hse-NH 2 
carboxyl termini. 

A set of synthetic oligonucleotides encoding the 
two GHRF analogs and CnBr cleavage sites are substituted 
of essentially the entire hypervariable region 'in a 
genomic clone encoding the 2S albumin of Arabidopsis 
tha liana * Only a few amino acids adjacent to the sixth 
and seventh cysteine residues remained. This chimeric 
gene is under the control of its natural promoter and 
signal peptide. The process and constructions are 
diagrammatically illustrated in Fig. 21 and 22. The 
entire construct is transferred to tobacco, Arabidopsis 
thaliana and Brassica napus pi ants us ing an 
Agrobacterium mediated transformation system. Plants 
are regenerated, and after flowering the seeds are 
collected and the 2S albumins purified. The GHRF 
peptides are cleaved from the 2S albumin using the CnBr 
which cleavage site is built into the oligonucleotide, 
and then recovered using HPLC techniques. 
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Cloning of the Arabidopsis thaliana 2S albumin gene 
The Arabidopsis thaliana gene has been cloned 
according to what is described in Example II (see also 
Krebbers et al., 1988). As already of record, the 
plasmid containing said gene is called pAT2Sl. The 
sequence of the region containing the gene, which is 
called AT2S1, is shown in figure 13. 

2. Deletion of the hypervariable region of AT2S1 
gene and replacement by an AccI site 
Part of the hypervariable region of AT2S1 is 
replaced by the following oligonucleotide: 

5'- CCA ACC TTG AAA GGT ATA CA C TTG CCC AAC - 3 f 30-mer 

PTLKGIH LPN 
15 in which the underlined sequences represent the AccI 
site and the surrounding ones sequences complementary to 
the coding sequence of the hypervariable region of the 
Arabidopsis 2S albumin gene to be retained. This 
results finally in the amino acid sequence indicated 
20 under the oligonucleotide. 

The deletion and substitution of part of the 
sequence encoding the hypervariable region of AT2S1 is 
done using site directed mutagenesis with the 
oligonucleotide as primer. The system of Stanssens et 
25 al. (1987) is used as described in example I 

The individual steps of the process are as follows: 

- Cloning of the Hindlll fragment of pAT2Sl 
containing the coding region of the AT2S1 gene 
into pMa5-8 (I) . This vector carries on amber 
mutation in the Cm R gene and specifies resistance 
to ampicillin. The resulting plasmid is 
designated pMacAT2Sl (see figure 21 step 1) . 
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- Preparation of single stranded DNA of this 
recombinant (II) from pseudoviral particles. 

- Preparation of a Hindlll restriction fragment 
from the complementary pMc type plasmid (III) . 
pMc-type vectors contain the wild type Cm R gene 
while an amber mutation is incorporated in the Ap 
resistance marker* 

- Construction of gap duplex DNA (hereinafter 
called gdDNA) gdDNA ( IV) by in vitro DNA/DNA 
hybridization. In the gdDNA the target sequences 
are exposed as single stranded DNA. Preoperative 
purification of the gdDNA from the other 
components of the hybridization mixture is not 
necessary. 

- Annealing of the 30-mer synthetic oligonucleotide 
to the gdDNA (V) . 

- Filling in the remaining single stranded gaps and 
sealing of the nicks by a simultaneous in vitro 
Kl enow* DNA polymerase I/DNA ligasereaction (VI). 

- Transformation of a mutS host, i.e., a strain 
deficient in mismatch repair, selecting for Cm 
resistance. This results in production of a mixed 
plasmid progeny (VII) . 

- Elimination of progeny deriving from the template 
strand (pMa-type) by retransf ormation of a host 
unable to suppress amber mutations (VIII) • 
Selection for Cm resistance results in enrichment 
of the progeny derived from the gapped strand, 
i.e., the strand into which the mutagenic 
oligonucleotide has been incorporated. 

- Screening of the clones resulting from the 
retransformation for the presence of the desired 
mutation. The resulting plasmid containing the 
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deleted hypervariable region of AT2S1 is called 
pMacAT2SlC40 (see figure 21 step 2) . 
3. insertion of sequences encoding GHRF into the 
AT2S1 gene whose sequences encodin g the hvpervariable 
region have been deleted 

As stated above when the sequences encoding most of 
the hypervariable loop were removed an AccI site was 
inserted in its place. The sequences of interest will 
be inserted into this AccI site, but a second AccI site 
is also present in the Hindlll fragment containing the 
10 modified gene. Therefore the Ndel -Hindlll fragment 
containing the modified gene is subcloned into the 
cloning vector pBR322 (Bolivar, 1977) also cut with Ndel 
and Hindlll. The position of the Ndel site in the 2S 
albumin gene is indicated in figure 4. The resulting 
subclone is designated pBRAT2Sl (Figure 21, step 3) . 
Sequences encoding the two versions of the growth 
hormone are inserted into the AccI site of pBRAT2Sl by 
constructing a series of complementary synthetic 
oligonucleotides which when annealed, form the complete 
sequence of the GHRF. The codon usage was chosen to 
approximately match that of AT2S1, a restriction site 
(Styl) to be used for diagnostic purposes was included, 
and at the ends of the GHRF encoding sequences staggered 
ends complementary to BamHI and PstI sites were 
included, along with extra bases to ensure that after 
the steps described below, the reading frame of the 2S 
albumin gene would be maintained. The eight 

oligonucleotides used in the two constructions are shown 
in figure 22. In figure 22A the limits of the 
oligonucleotides are indicated by the vertical lines, 
and the numbers above and below the sequence indicate 
their numbers. In oligonucleotides 4 and 8 the bases 
enclosed -JLn the box are excluded, resulting in the GHRFS 
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version of the construction. The bases marked by an * 
in figure 22A were found to have mutated to a T in the 
clone used for the further construction of GHRFL (pEK7) , 
but as these changes did not effect the amino acid 
sequence the changes were not corrected. The peptide 
sequence of the GHRF peptide and the methionines 
included to provide CnBr sites are shown above the DNA 
sequence. The overhanging bases at each end serve to 
ligate the fragments into BamHI and PstI sites. These 
are removed by the SI digestion. The blunt end fragment 
is then ligated into the Klenow treated AccI site of 
pBRAT2Sl as shown in Fig. 22B. The reading frame 
context of the AccI site is shown in the upper part of 
the figure , the cleavage sites being indicated by a 1 . 
The results of the manipulation are below , with the 
bases resulting from the AccI site and its filling in 
shown in bold type. 

All six oligonucleotides used in each construction 
were kinased. For the annealing reaction 2 pmole of 
each oligonucleotide were combined in a total volume of 
12 pi. The mixture was incubated at 90 *C for 10 min, 
moved to at temperature of approximately 65-70 *C for 10 
min r and then allowed to cool gradually to 30-35*C over 
a period of 30-45 min. At the end of this period ligase 
buffer (Maniatis et al., 1982) and 1.5 units of T4- 
ligase were added, the volume adjusted to 15 nl and the 
mixture incubated overnight at 16 *C. The mixture was 
then incubated at 65 # C for 5 min after which 2.5 pi of 
100 mM NaCl restriction endonuclease buffer (Maniatis et 
al., 1982), 5-10 units each of BamHI and PstI added, and 
the volume adjusted to 25 pl« This digest is to cleave 
any concatemers which have formed during the ligation 
step. After digestion for 45 min the reaction was 
extracted with phenol/chloroform, precipitated, and 
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resuspended in 10 ^1, 5 ftl of which were ligated with 
pUCl8 (Yanisch-Perron et al., 1985) which had been 
digested with BamHI and PstI and treated with bacterial 
alkaline phosphatase. After transformation of bacterial 
cells by standard techniques (Maniatis et al., 1982), 
~* recombinant colonies were screened by the method of 
Grunstein (1975) using oligonucleotide number 1 end 
labeled with 32 P. Clones from each version of the GHRF 
gene were sequenced, and one clone for each version, 
designated pEK7 (containing GHRFL) and pEK8 (containing 
GHRFS) were used in further steps (See step 4 in figure 
21) . 

The BamHI-PstI fragments of pEK7 and pEK8 were 
inserted into the AccI site of pBRAT2Sl (Fig. 21, step 
5) . The details of the treatments done to maintain the 
open reading frame are shown in Fig. 22. pEK7 and pEK8 
were each cut with both BamHI and PstI, treated with SI 
nuclease, and the fragments containing the GHRF encoding 
sequences isolated after gel electrophoresis. These 
fragments were then separately ligated with pBRAT2Sl 
which had been cut with AccI and treated with the Klenow 
fragment of DNA polymerase I. The resulting clones were 
checked for the appropriate orientation of the GHRF 
encoding sequences by digestion with Styl, a site for 
which had been included in the synthetic sequences for 
this purpose, and Hindlll. Several clones which proved 
to contain inserts in the correct orientation were 
sequenced. The latter is necessary because SI nuclease 
digestion cannot always be strictly controlled. One 
clone for each - of two GHRF constructions confirmed to 
have the correct sequence was used in further steps. 
These were designated pEKlOO and pEK200 for GHRFL and 
GHRFS respectively. 
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4 • Reconstruction of the complete modified AT2S1 
gene with its natural promoter 

The complete chimeric gene is reconstructed as 
follows (see figure 21) : The clone pAT2SlBg contains a 
5 3.6kb Bglll fragment inserted in the cloning vector 
pJB65 (Botterman et al., 1987) which encompasses not 
only the l.Okb Hindlll fragment containing the coding 
region of the gene AT2S1 but sufficient sequences 
upstream and downstream of this fragment to contain all 

xo necessary regulatory elements for the proper expression 
of the gene. This plasmid is cut with Hindlll and the 
5.2kb fragment (i.e. , that portion of the plasmid not 
containing the coding region of AT2S1) is isolated. The 
clone pAT2Sl is cut with Hindlll and Ndel and the 

15 resulting 320 bp Hindlll-Ndel fragment is isolated. 
This fragment represents that removed from the modified 
2S albumin in the construction of pBRAT2Sl (step 3 of 
figure 21) in order to allow the insertion of the 
oligonucleotides in step 5 of figure 21 to proceed 

20 without the complications of an extra AccI site. These 
two isolated fragments are then ligated in a three way 
ligation with the Ndel-Hindlll fragments from pEKlOO and 
pEK200 respectively (figure 21 , step 6) containing the 
modified coding sequence. Individual tranformants can 

25 be screened to check for appropriate orientation of the 
reconstructed Hindlll fragment within the Bglll fragment 
using any of a number of sites. The resulting plasmids 
pEK502 and pEK6011 consist of a 2S albumin gene modified 
only in the hypervariable region , surrounded by the same 

30 flanking sequences and thus the same promoter as the 
unmodified gene, the entirety contained on a Bglll 
fragment. 

5. Transformation of plants 
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The Bglll fragment containing the chimeric 
gene is inserted into the Bglll site of the binary 
vector pGSC1703A (fig. 16) (see also Fig- 21 step 6) , 
used and described in section 3 of example 2. The 
resultant plasmid is designated pTAD12. Using standard 
procedures (Deblaere et al., 1987), pTAD12 is 
transferred to the Agrobacterium strain C58ClRif 
carrying the plasmid pMP90, also used in section 3 of 
Example II. This Agrobacterium is then used to transform 
plants. Tobacco plants of the strain SRI are 
transformed using standard procedures (Deblaere et al., 
1987) . Calli are selected on 100 ug/ml kanamycin, and 
resistant calli used to regenerate plants. 

The techniques for transformation of Arabidopsis 
thaliana and Brassica napus are such that exactly the 
same construction, in the same vector, can be used. 
After mobilization to Agrobacterium tumefaciens as 
described herebove, the procedures of Lloyd et al., 
(1986) and Klimaszewska et al. (1985) are used for 
transformation of Arabidopsis and Brassica respectively. 
In each case, as for tobacco, calli can be selected on 
100 ng/nl kanamycin, and resistant calli used to 
regenerate plants. 

In the case of all three species at an early stage 
of regeneration the regenerants are checked for 
transformation by inducing callus from leaf on media 
supplemented with kanamycin (see also point 6) . 

6. Screening and analysis of transformed plants 

In the case of all three species, regenerated 
plants are grown to seed. Since different transformed 
plants can be expected to have varying levels of 
expression ("position effects 11 , Jones et al., 1985), 
more than one tranformant must initially be analyzed. 
This can in principle be done at either the RNA or 
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protein level. In this case seed RNA was prepared as 
described in Beachy et al., 1985 and northern blots 
carried out using standard techniques (Thomas et al., 
1980) . Since in the case of both Brassica and 
Arabidopsis of the entire chimeric gene would result 
in cross hybridization with endogenous genes, 
oligonucleotide probes complementary to the 
insertion within the 2S albumin were used; one of the 
oligonucleotides as used to make the construction 
can be used. For each species, 1 or 2 individual 
plants were chosen for further analysis as disclosed 
below. 

First the copy number of the chimeric gene is 
determined by preparing DNA from leaf tissue of the 
transformed plants (Dellaporta et al. , 1983) and probing 
with the oligonucleotide used above. 

7. Isolation of GHRF analogs 

A) Purification of the chimeric 2S albumins 

The 2S albumins are purified, by high salt 
extraction, gel-filtration and reversed-phase HPLC as 
described in example II. 

The correct elution times of the chimeric 2S 
albumins are determined by immunological techniques 
using commercially available (UCB-Bioproducts , 
Drogenbos, Belgium) antibodies directed against the 
natural GHRF 

B) Cleavage of the chimeric 2S albumin and isolation of 
the GHRF analogs 

The desalted HPLC- purified GHRF containing 2S 
albumins are then treated with CNBr (Gross and Witkop, 
1961) . CnBr will liberate the GHRF analogs with an extra 
homoserine/homoserine-lactone still attached to the 
COOH- terminus. The GHRF analogs are purified using 
classical reversed phase HPLC techniques, as described 
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in Example II, and their amino acid sequence is 
determined using the method described in Example II. 
The isolated GHRFS analog are amidated using ammonia, 
n-butylamine and n-dodecyl amine as described by Kempe et 
al., 1986. This results in the described Arg-Hse-NH 2 
terminus . 

The second analog, GHRFL, with an extra methionine still 
present at the carboxyl terminus, is first treated with 
carboxypeptidase B, removing the carboxyl terminal 
homoserine residue (Ambler, 1972). This results in a 
Leu-Gly-COOH terminus. Treatment with the D-amino acid 
oxidase in the presence of catalase and ascorbate, as 
described in Kreil (1984), converts the glycine-COOH 
terminal into the terminal amide-CONH 2 and glyoxylic 
acid. This set of enzymatic steps results in the final 
amidated GHRFL analog. 

The examples have thus given a complete 
illustration of how. 2S-albumin storage proteins can be 
modified to incorporate therein an insert encoding Leu 
enkephalin or the Growth Hormone Releasing Factor 
followed by the transformation of tobacco, Arabidopsis 
and Brassica cells with an appropriate plasmid 
containing the corresponding modified precursor nucleic 
acid, the regeneration of the transformed plant cells 
into corresponding plants, the culture thereof up to the 
seed forming stage, the recovery of the seeds, the 
isolation therefrom of the hybrid 2S albumin and finally 
recovery the Leu-enkephalin or the GHRF from said hybrid 
protein in a purified form. 

It will readily be appreciated that the invention 
thus provides a breakthrough in the art of genetically 
engineering proteins or polypeptides and of producing 
them in considerable amounts under conditions yielding 
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them in a configuration that comes close to their 
natural ones. 

It goes without saying that the invention is not 
limited to the above examples. The person skilled in the 
art will in each case properly select the storage 
proteins to be used for the production of any determined 
polypeptide or peptide of interest , the nature thereof, 
e.g. depending the adequate restriction sites which it 
contains in order to accommodate at best the 
corresponding DNA insert , the choice of the most 
suitable the seed specific promoter depending on the 
nature of the seed forming plant to be transformed for 
the sake of producing the corresponding hybrid protein 
from which the peptide of interest can ultimately be 
cleaved, recovered and purified. 

There follows a list of bibliographic references 
which have been referred to in the course of the present 
disclosure to the extent when reference has been made to 
known methods for achieving some of the process steps 
referred to herein or to general knowledge which has 
been established prior to the performance of this 
invention. 

It is further confirmed 

- that plasmid pGV2260 has been deposited with the 
DSM on 2799 on December, 1983. 

- plasmid pSOYLEA has been deposited with the DSM 
on 4205 on August 3, 1987; and 

- plasmid pBN 2S1 has been deposited with the DSM 
on 4205 on August 3, 1987. 

- plasmids pMa5-8 have been deposited with the DSM 
on 4567 and pMc on 4566 on May 3, 1988. 

- plasmid pAT2Sl has been deposited with the DSM on 
4879 on October 7, 1988 
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- plasmid pAT2SlBg has been deposited with the DSM 
on 4878 on October 7, 1988 

- plasmid pGSC1703A has been deposited with the DSM 
on 4880 on October 7, 1988 

- plasmid pEK7 has been deposited with the DSM on 

4876 on October 7, 1988* 

- plasmid pEK8 has been deposited with the DSM on 

4877 on October 7, 1988. 
nowithstanding the fact that they all consist of 
constructs that the person skilled in the art can 
reproduce them from available genetic material without 
performing any inventive work. 
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2S Albumin As % Of Total Seed Protein 



TABLE 1 





5 


Family, species 
(common name) 


% 


* 


10 


Compositae 

Helianthus animus 
(sunflower) 


62 




15 

» 


Cruciferae 

Brassica spp. 
(mustard) 

Linaceae 

Linum usitatissimum 
(linseed) 


62 
42 




20 


Lepuminosae 

Lupinus polyphyllus 
(lupin) 


38 




25 


Arachis hypogaea 
(peanut) 

Lecythidaceae 

Bertholletia excelsa 
(brazil nut) 


20 
30 




30 


Liliaceae 

Yucca spp. 
(yucca) 


27 




35 


Euphorbiaceae 

Ricinus communis 
(castor bean) 


44 



From Youle and Huang, 1981 
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CLAIMS 

1. A process for producing a determined polypeptide 
of interest in a seed forming plant which comprises: 

5 - cultivating plants obtained from regenerated 

plant cells or from seeds of plants obtained from 
said regenerated plant cells over one or several 
generations, wherein the genetic patrimony or 
information of said plant cells, replicable 
10 within said plants, includes a nucleic acid 

sequence, placed under the control of a seed- 
specific promoter; 

- recovering the seeds of the cultivated plants and 
extracting the hybrid storage proteins contained 

15 therein, 

- cleaving out the peptide of interest from said 
hybrid storage protein at the level of said 
cleavage sites; and 

- recovering the peptide of interest in a purif ied 
20 form; which can be be transcribed into the mRNA 

encoding at least part of the precursor of a 
storage protein including the signal peptide of 
said plant, said nucleic acid being hereafter 
referred to as the "precursor-coding nucleic 
25 acid" 

wherein said nucleic acid contains a 
nucleotide sequence (hereafter termed the 
"relevant sequence") , which relevant sequence 
comprises a non-essential region modified by a 
30 heterologous nucleic acid insert forming an 

open-reading frame in reading phase with the 
non-modified parts surrounding said insert in 
said relevant sequence, 
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wherein said insert includes a nucleotide 
segment encoding said polypeptide of interest, 
. wherein said heterologous nucleotide segment 
is linked to the adjacent extremities of the 
surrounding non-modified parts of said 
relevant sequence by one or several codons 
whose nucleotides belong either to said insert 
or to the adjacent extremities or to both, 
• wherein said one or several codons encode one 
or several aminoacid residues which define 
selectively cleavable border sites surrounding 
the peptide of interest in the hybrid storage 
protein or storage protein subunit encoded by 
the modified relevant sequence. 
2* The process of claims wherein said polypeptide 
of interest is formed of repeats of a unit consisting of 
a biologically active polypeptide or protein separated 
from one another by selective cleavage sites which allow 
for their separation either during purification or 
subsequently thereto. 

3. The process of claim 1 or claim 2, wherein said 
plant storage protein is water-soluble. 

4. The process of any of claims 1 to 3 wherein said 
plant storage protein is a 2S-protein, such as a 2S 
albumin-storage protein. 

5. The process of any of claims 1 to 4 wherein said 
seed specific promoter belongs in nature to the same 
nucleic acid as the precursor nucleic acid. 

6. The process of any of claims 1 to 5 wherein said 
seed specific promoter is heterologous with respect to 
said precursor nucleic acid. 

7. The process of any of claims 1 to 5, wherein the 
polypeptide of interest is a labeled protein comprising 
atoms selected from the group consisting of carbon with 
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mass number 14 , nitrogen with mass number 15, hydrogen 
with mass number 3, sulphur with mass number 35 and 
phosphor with mass number 32. 

8. The process of any of claims 1 to 7, wherein the 
nucleic acid segment which encodes the polypeptide of 
interest is foreign to the natural nucleic acid encoding 
the precursor of said storage protein. 

9. The process of any one of claims 1 to 8, wherein the 
heterologous, insert contains a segment as above-defined 
normally present in the genetic patrimony or information 
of said seeds or plant cells, the "heterologous" 
character of said insert then addressing to the one or 
several codons which surround it, on both sides thereof 
and which link said segment to the non modified parts of 
the nucleic encoding said precursor. 

10. A recombinant DNA which includes a nucleic acid 
sequence, which can be transcribed into the mRNA 
encoding at least part of the precursor of a storage 
protein including the signal peptide of said plant, said 
nucleic acid being hereafter referred to as the 
"precursor-coding nucleic acid" : 

. wherein said nucleic acid contains a nucleotide 
sequence (hereafter termed the "relevant 
sequence"), which relevant sequence comprises a 
non-essential region modified by a heterologous 
nucleic acid insert forming an open-reading frame 
in reading phase with the non modified parts 
surrounding said insert in said relevant 
sequence, 

. wherein said insert includes a nucleotide segment 
encoding said polypeptide of interest, 

. wherein said heterologous nucleotide segment is 
linked to the adjacent extremities of the 
surrounding non modified parts of said relevant 
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sequence by one or several codons whose 
nucleotides belong either to said insert or to 
the adjacent extremities or to both, 
. wherein said one or several codons encode one or 
several aminoacid residues which define 
selectively cleavable border sites surrounding 
the peptide of interest in the hybrid storage 
protein or storage protein subunit encoded by the 
modified relevant sequence. . 

XI. A recombinant DNA wherein said precursor coding 
10 ~ _ 

nucleic acid is placed under the control of a seed- 
specific promoter. 

12. The recombinant DNA of claim 11, wherein said 
plant storage protein is a 2S-protein, such as a 2S- 

15 albumin. 

13. The recombinant DNA of any of claims 10 to 12 
which is a plasmid. 

14. The recombinant DNA of claim 13 which is capable 
of transforming plant cells and of causing the 
replication of said modified precursor nucleic acid 

20 

sequence in said plant cells. 

15. The recombinant DNA of claim 14 which is a Ti- 
derived plasmid. 

16. As a regenerable source of a polypeptide of 
interest, which is formed of either plant cells of a 

2 ^ seed-forming plant, which plant cells are capable of 
being regenerated into the full plant or seeds of said 
seed-forming plants wherein said plants or seeds have 
been obtained as a result of one or several generations 
of the plants resulting from the regeneration of said 

30 plant cells, wherein further the DNA supporting the 
genetic information of said plant cells or seeds 
comprises a nucleic acid or part thereof, including the 
sequences encoding the signal peptide, which can be 
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transcribed in the mRNA corresponding to the precursor 
of a storage protein of said plant, placed under the 
control of a seed specific promoter, and 

. wherein said nucleic acid sequence contains a 
5 relevant modified sequence encoding the mature 

storage protein or one of the several sub- 
sequences encoding for the corresponding one or 
several subunits of said mature storage protein, 
. wherein further the modification of said relevant 
10 sequence takes place in one of its non essential 

regions and consists of a heterologous nucleic 
acid insert forming an open-reading frame in 
reading phase with non modified parts which 
surround said insert in the 
15 . wherein said insert includes a nucleotide segment 

encoding said polypeptide of interest, 
. wherein said heterologous nucleotide segment is 
linked to the adjacent extremities of the 
surrounding non modified parts of said relevant 
20 sequence by one or several codons whose 

nucleotides belong either to said insert or to 
the adjacent extremities or to both, 
. wherein said one or several codons encode one or 
several aminoacid residues which define 
25 selectively cleavable border sites surrounding 

the peptide of interest in the hybrid storage 
protein or storage protein subunit encoded by the 
modified relevant sequence. 

17. The source of polypeptide wherein said plant 
30 storage protein is a 2S-protein, such as a 2S-albumin. 

18. The source of polypeptide of claim 16 or 17, 
wherein said insert in a synthetic man-made 
oligonucleotide. 
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19. The source of polypeptide of claim 16, 17 or 18, 
wherein the heterologous segment contained in said 
insert encodes a non plant variety specific polypeptide. 

20. The recombinant DNA of any of claims 10 to 15 
which is in a plant cell environment. 

21. A genetically engineered seed- forming plant 
which normally contain 2S-protein or part of said plant, 
in which all cells contain the recombinant DNA of any of 
claims 10 to 15. 

22. A genetically engineered seed which normally 
contain 2S-protein and all cells of which contain the 
recombinant DNA of any of claims 10 to 15. 
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Figure 21 Flow chart of constructions showing succesive steps in the deletion of se- 
quences encoding most of the hypervariable region of the Arabidopsis 
2S and their replacement with sequences encoding two GHRF analogs. 
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